Navigating The Data “In Natura”
In the year 2000, I was hired to work as a consultant in a State Public Agency in Brazil with the mission of developing products for the Examination Area.
At that time, a large software company had been developing for more than two years, a Data Warehouse to serve the Examination Department, but no results seemed visible within the time limits of the project.
When analyzing the generated documents, we verified that this was merely a documentation of intentions following the standards of the company’s methodology and that it reflected a specification made by the users, lacking, therefore, greater technical details necessary to achieve the project.
Given the urgency of results, we then took another approach considering the following premises regarding the data required for the project:
As for the necessary requirements for servicing the area we considered that:
We then proposed the following solution:
The system works with successive refinements: A research consists of Elements, as a result of selections of attributes of a Theme. The inclusion of new attributes generates a new group which is included in the initial group, and so on. At any time, the research can be saved, as it is already the final result, or for use in other research, either for retrieval or presentation of data.
At this point some readers must be concerned with the proposal for accessing operational data. The idea is to use the same structures; therefore, in the case under analysis, the same operational data were used, due to the limited number of users who access the system and mainly by an implementation of the system known as Base Set. The Base Set is a set used as a reference, starting, that is, any result set always belongs or is contained in the Base Set.
Entering a Theme such as Collection, Electronic Invoices, among others, without a Base Set can be time-consuming for a system that should be repetitive. Therefore, if the Base Set has a few thousand Elements and the database is well configurated, there are no major problems.
Example:
We consider the existence of the Cadaster Theme, the Taxpayers´ Economic Activities Theme, the Collection Theme and the Imports Theme:
The objective is to obtain the Taxpayers identified by the CNPJ from a certain range of capital stock, a set of Economic Activities, who originated more than a certain amount and paid less than a tax amount in a given reference period;
In the 4 themes we have the desired result, a list of CNPJ that correspond to the example. The user could perform any of the other operations in the sets obtaining the sets as needed, or follow another navigation strategy. In addition, if the number of elements does not meet the requirements, the user may adjust the capital stock or use other economic activity codes or other parameters to include or eliminate taxpayers from the selected set.
Once the appropriate set of elements is obtained, except for a research résumé, history, this set can be used at any time to extract data from any themes configurated in the system, or serve as a control for the generation of cases and issuance of service orders for the examination teams, or serve as sets to participate in other operations with new sets.
This system became available in its initial version in 6 months and the result was so satisfactory that it came to be used by the Office of the Secretary of Revenue to extract managerial information from the collection area, among others.
In the initial version implemented this system was known as “PLAFIS – Examination Planning – Management Module”. Later, it was known as “JONAS – Just Online Navigation Analysis and Selection System”.
The Story Repeats Itself
Later, in 2014, I was hired for a consultancy project in another State Public Agency in Brazil. At that time, the Administration had a great expectation in a project, which had been under development for two years, called “DW” which consisted of the structuring of a reliable database established in a server as part of the operating environment. The data was transported daily to this environment through a system based on ACL (Audit Command Language) which carried out the recovery procedures in “Batches” (Lots).
Although our project gave some support to the DW project, at the end of another three years the DW project was declared non-viable. In other words, after 5 years, the expectation of having a database to retrieve essential information for the Administration was frustrated.
Thinking of a transition solution until we had the new system and a DW that met the Administration’s expectations, I recovered the JONAS system described above that was on platforms no longer supported (Windows XP and Delphi 5), which was installed in a virtual machine, which could be an alternative in the process of finding a solution that would meet the information needs of the administration.
The JONAS was configurated over the Operational Database that served the Administration and did not require any adjustment for the presentation of the prototype as a proposal of an alternative path. Unfortunately, our project did not have deadlines or resources for the development of solution in the current platforms, which ended up occurring later, with the construction of JONAS 2.0 using other resources.
The described solution is very adequate for the systematic or temporary recovery of information, while awaiting or not for other solutions. Its information capacity is something impressive. Let us see the following example as an exercise:
Assuming there are 20 Themes, each with 5 recoverable information attributes (a dimension, for example, Municipality, is unique regardless of the possible values) we can make the following estimate:
In total 20 x 5 = 100 attributes. From the combinatory analysis:
C 100,1 | = | 100! / (1! * 99!) | = | 100 |
C 100,2 | = | 100! / (2! * 98!) = 100 * 99 / 2 | = | 4.950 |
C 100,3 | = | 100! / (3! * 97!) = 100 * 99 * 98 / 6 | = | 161.700 |
That is: There are 166,750 combination options with up to 3 attributes, if there were more attributes …
Obviously, many of the combinations may not make sense, but they are available for the experts. We can say that: In the depths of the unprocessed data there is a wealth of information waiting to be highlighted.
View of the interface with Search History
View of the Information Analysis, selecting the Economic Activity Dimension
1,588 total views, 3 views today