And yet…
A post as unnecessary as important
I don’t remember precisely the day when, for the first time, I heard that old statement that already sounds like a cliché, that if an information system allows junk to enter, it will only produce junk. It is just obvious. Achieving adequate levels in the data quality is essential. One should think that insisting on this is superfluous. And yet…
Tax administrations obtain data from different and increasingly varied sources of information. In many cases, taxpayers prepare and send, in various formats, details on withholdings and payments made, their books and electronic documents, paid and received interests, employees’ payroll reports, remittances abroad, operations with related parties, dividend payments, insurance premiums, and many other etcetera. It is precisely this enormous amount of information that allows developing control programs of massive scope and, for an already significant number of administrations, pre-completed statements.
It is worth reminding the efforts developed over the years by tax administrations to go from a state of paper-based returns to the current stage, where electronic returns prevail. Errors due to faulty calculation, use of wrong types, illegibility of letter, overlapping squares, imperfect identification data have been decreasing while we passed from manual methods to software-assisted mechanisms to generate disks or files to incorporate. The possibility of filing a return online has made possible, for example, to perform validations on the use of credits from other periods or immediately be alerted to inconsistencies with other information already in the possession of the administration. The quality of the information in the returns has significantly improved over the years.
But, sometimes, as the saying goes, the priest soon forgets that he has been sacristan, and by receiving so much information from taxpayers with few or very few errors, we could assume that any new information sent in electronic media will have the same desirable levels of quality. The truth is that every new piece of information tend to be more complex, because it will be reported by more informants, because the volumes will be bigger, because data must be sent at a time very close to the transactions (or even before the transactions are executed) leaving the process without time or space for reviewing, validating and correcting it. We assume that hundreds, thousands or hundreds of thousands of taxpayers are able, using a technical guide and a couple of manuals, to understand perfectly an XSD scheme, develop software extracting information from their systems without bugs, transform it, process it, prepare a XML, sign it electronically, transmit it to the administration. And that they do all that without errors, even the first time.
Then, sometime later, after a period often not negligible, when trying to use that information for control purposes, we find that it has so many errors that the effort to debug it would require a disproportionate cost and that to correct the already developed habits of hundreds, thousands or hundreds of thousands of informants who had received no feedback, no reprimand or fine for their previous submissions would become a project in itself.
Obtaining information from third parties is something which, in my opinion, is going to increase. It will come from other state agencies; from taxpayers who are starting to use electronic documents (electronic invoices for example), e-books and e-reports whose details are sent to the Administration; from other tax administrations after the implementation of automatic information exchange agreements; from multinationals that report their operations country-by-country, and perhaps from others less prepared to prepare an XML file, such as professional associations, churches or non-profit foundations.
There is a strong temptation to establish only a minimum set of validations before proceeding to accept, identify and store files and their data. Control, information crossings, business intelligence would come later, perhaps much later. We must avoid at all costs the “Let’s see this later”.
In my opinion, for each new set of information, it is necessary to involve from the beginning the areas of the Administration that could use it for control purposes, incorporating all the elements necessary for its use, and all the possible validations must be included. Most importantly: the first control tasks should verify that informants report the right thing at the right time with the right media and then, only when the information has a sufficient level of reliability, go to the tax control.
In the process of improving the quality, I think that a Bayesian scheme can be implemented. Once data is supported by all the validations, it is possible to estimate a probability that the data are correct and we would try to improve or stop trying to improve this quality when the probability of incorrect data is relatively high. Additionally, when new information is obtained, the probability will be affected and consequently the decision on the action. For example, in electronic invoicing, even if the format, identification of buyer and seller, details of goods, tax calculations and the summations seem correct, an invoice with a value unusually high for this taxpayer (for example with more than three standard deviations above the average of previous operations) would have a high probability of containing an error of information and consequently, the Administration, after a prudent time, should send a message to inform the taxpayer about the possible error. In another case, an invoice that meets the validation and rules and is within the limits could be considered with low probability of errors, however, if subsequently the Administration identifies that the buyer had died a few months before, the likelihood of problems would be calculated again and this time a very different situation would be identified.
It is logical. We might think that it could not be otherwise. But these things happen. It is something like the first verses of “Y sin embargo” (and yet), that magnificent song of Joaquín Sabina saying “they have told me a thousand times, but I never wanted to pay attention, when came the tears, you were already deep within my heart.”
Greetings and good luck
747 total views, 1 views today
1 comment
I vote for “Important” not “Unnecessary.”
Somewhat related, one of your comments reminded me of a recent conversation I had with a member-country tax commissioner. The point being discussed was, “Why are we amassing sooo much data if we don’t have staff with sufficient skills to retrieve it for any practical use?”
[Luego, un tiempo después, tiempo que a veces no es despreciable, al momento de intentar usar esa información para fines de control, encontramos que la misma tiene tantos errores…]
So, before we stacked all the paper tax declarations, schedules, supplements and attachments in the tax office hallways. We required tons of documents from taxpayers so we could properly “control” their business activities. Chances are we possibly only looked at 1/1000 of 1 percent of those documents; too burdensome to look through half of that stuff. And then along came progress; now we stack it in our servers.
Going back to my conversation with the commissioner, the issue is that there are terabytes of very useful data in our systems to assist us with complex audits. The trouble is, is our tax inspector sufficiently skilled enough to prepare programs to extract that data? Do some good data mining before he or she goes out to a field audit? I would guess that in most cases the answer is “no.”
And there lies the problem, how do developing countries ensure a “good” audit of taxpayers when today the journals, ledgers, invoices and just about everything else is automated. Any ideas?