From algorithm to dialogue: the power of generative AI in chatbots and implications in tax administration
As artificial intelligence (AI) becomes more popular, many companies are rushing to adopt chatbots with the intention of saving time, reducing costs and maximizing efficiency. Gartner predicts that chatbots will become the main customer service channel by 2027. [1]
Chatbots basically follow two types of technologies: generative artificial intelligence (GenAI) and rule-based (rule-based).
Generative AI chatbots have the advantage of a better understanding of language, adaptability and the ability to generate creative content. They learn from large amounts of text data and continuously update their knowledge to provide accurate and relevant answers. However, they may produce misleading or incorrect information and there are questions regarding privacy. [2]
Rule-based chatbots excel in scenarios with simple and predictable queries and offer a cost-effective solution. They lack the flexibility and creativity of generative AI chatbots, but they can efficiently manage FAQ and customer support tasks.
Generative AI chatbots use large-scale language models (LLMs) to generate responses based on user requests. These models are trained on massive data sets, allowing them to understand and produce human-like responses using deep learning models, neural networks and natural language processing.
By contrast, rule-based chatbots adhere to a collection of predetermined rules and use “if/then” statements to determine the appropriate response based on specific keywords[3].
Due to the wide spectrum of applicability and flexibility of dealing with complex topics, in addition to providing a more fluid dialogue, improving the overall customer experience, LLM-based chatbots are advancing in the preference of large organizations.
The most well-known generative AI models used in chatbots are:
- GPT (Generative Pre-trained Transformer): used in systems such as ChatGPT, known for its ability to generate coherent and accurate text in different contexts.
- LaMDA (Language Model for Dialogue Applications): Google model designed specifically for long and complex conversations.
- BERT (Bidirectional Encoder Representations from Transformers): although it is not purely generative, it improves the understanding of the context.
- LLaMa (Large Language Model Meta AI) is an LLM model disclosed by Meta AI in February 2023.
However, this type of chatbot is not without its problems. Sometimes, they may invent information or make mistakes (called “hallucinations”), which calls into question their reliability to provide accurate data and can cause serious damage to companies that adopt them.
The real problems
An example of this happened with Air Canada. Your chatbot mistakenly advised a customer to follow a “bereavement refund” policy that did not exist. In February, a Canadian small claims court ruled in favor of the customer, despite the airline trying to blame the software, claiming the chatbot was a “separate legal entity responsible for its own actions” [4].
In other case, the delivery company DPD, which uses artificial intelligence in its online chat to answer questions in conjunction with human operators, had to disable part of its chatbot. After an update, the system began to insult customers and criticize the company itself, showing how a failure can harm corporate reputation.
The problem is not limited to the private sector. The chatbot MyCity, created by the City of New York to provide citizens with accurate information about local regulations and laws, made serious mistakes. Instead of helping, he ended up offering incorrect guidance, such as claiming that employers could appropriate part of their employees’ tips, or that there were no laws that required notifying workers about shift changes. [5]
In another case, a technological company startup technological he decided to bet on the implementation of a chatbot with the aim of optimizing customer service and automating internal processes. To do this, they hired a prestigious company in the market to develop the tool, dedicated time and resources to train their team and eagerly awaited the launch. However, the result was not as expected. Customers found it difficult to interact with the chatbot, whose responses were generic and, in many cases, irrelevant. This not only frustrated users but also caused a considerable increase in complaints and a decrease in customer satisfaction levels, leaving the company with the need to rethink its technological strategy. [6]
It should be noted that the suppliers themselves have their problems. On December 26, 2024, ChatGPT, the popular artificial intelligence chatbot developed by OpenAI, experienced a global outage, ceasing to function properly and displaying error messages to users. OpenAI attributed the problem to an outside vendor and worked to resolve it.
In one tragic case, a Belgian man named Pierre (not his real name), concerned about climate change, interacted with an AI chatbot called Eliza. The conversations led Pierre to make extreme decisions, culminating in his suicide. This incident highlighted the risks of relying on chatbots for emotional support without proper supervision. [7]
These real cases are not shown to discourage the use of AI-based chatbots, but warn that in addition to the immense benefits, they also have their “dark side” that needs to be taken care of.
Chatbots and binding queries
Returning to the tax world, a binding consultation is a formal mechanism provided for in tax legislation through which the taxpayer submits a specific question to the tax administration, and the response issued is mandatory for said administration. This procedure only applies to the specific case consulted and must be framed in a formal and regulated process.
If the query is made outside the official procedures, such as through telephone calls, emails or automated assistance portals, the response issued is considered only indicative and is not binding.
Due to the limitations mentioned above, chatbots, even those based on AI, would be recommended mainly to manage non-binding queries.
However, and considering that binding queries are usually the most demanding in terms of resource consumption and legal security for tax administrations, considering as an alternative the adoption of chatbots as a support tool in handling binding queries could optimize this process.
An ideal approach would be for the tax administration to submit the binding query to the chatbot. This would generate a draft as an initial response, which would be reviewed and refined by the officials responsible for the official response. Once completed, the final answer would be incorporated into the chatbot system using machine learning techniques. This would allow the chatbot to “learn” from the process, in addition to helping in the detection of possible inconsistencies and suggesting improvements in the response.
With this option, the tendency would be for human interventions to progressively decrease, speeding up responses and improving operational efficiency.
In addition, the knowledge accumulated by the chatbot would increase the accuracy and quality of responses to general non-binding queries, strengthening taxpayers’ trust in the services of tax administrations.
The context described for binding queries applies to the use of chatbots in several critical areas of tax administration.
Thus, the orientation is not to avoid AI-based chatbots, but to solve or mitigate the risks they present, while taking advantage of the great advantages offered by these technologies.
Best practices
The solution or mitigation of the risks associated with the use of AI-based chatbots is based on a combination of technical, ethical and operational best practices, ensuring the improvement of their effectiveness and reliability in taxpayer service.
Below are several important strategies organized by requirement areas. [8] Some of them are applicable to all types of chatbots, others are inherent to chatbots based on generative AI:
- Quality data
Training of the model with reliable data appropriate to the performance context: Use up-to-date data sets, without bias or incorrect information. It is essential to have an updated legal framework, as well as a history of legal developments. It is not an easy requirement to follow, but “data governance” disciplines can support the realization of this indication. [9]
Prioritize data from verified sources.
Example: use databases owned by the tax administration and other government bodies, in addition to the Legislative and Judicial Branches; acquire reliable external databases, when necessary.
Continuous monitoring: Periodically evaluate the data used to avoid out-of-date and outdated or unwanted trends.
- Transparency and ethics
Statement of intentions and limitations: Inform the taxpayers about the purpose of the chatbot, its capabilities and its limitations.
Example: “This chatbot (or channel) provides general information and does not replace the binding queries channel.”
Source registratio: Whenever possible, include references or links to validate the answers provided.
Avoid prejudice: Implement frequent audits on the system to identify and mitigate biases in data and algorithms. We have in the literature examples of problems caused by data biases and algorithms that cause serious inconvenience for taxpayers and for the tax administration itself and the government. [10]
- Technical robustness
Constant updating of the model: Incorporate constant improvements, based on advances in technology and feedback from the taxpayers.
Error detection systems: Implement mechanisms that identify and report errors or inconsistencies in the system.
Abilities to face uncertainty: When it does not know an answer, the chatbot should be able to admit it and, if possible, redirect the taxpayer to reliable sources.
- Security and privacy
Data protection: Make sure that the chatbot follows data protection rules, such as GDPR or LGPD.
Not to collect personal information without the explicit consent of the taxpayer.
Data encryption: Use secure methods to store and process the collected information.
Anonymity: Avoid linking taxpayer data with personal information.
- User experience
Clear and easy-to-use interface: Structure the answers in an accessible way, avoiding complex technical terms.
Comments from the taxpayer: Include channels for taxpayers to report problems or suggest improvements.
Capacity for continuous learning: Implement supervised machine learning to adjust the behavior of the chatbot to the needs of the taxpayers.
- Continuous assessment
Periodic testing: Perform technical audits and performance tests frequently.
Simulate scenarios to verify the reliability of responses in different contexts.
Quality metrics: Evaluate the taxpayer’s satisfaction, the error rate and the consistency of the responses.
Perspectives and final comments
Chatbots based on generative AI have great potential to lead the segment in tax administrations, thanks to their ability to offer more dynamic and personalized interactions. However, in order to be massively adopted, they must overcome significant challenges related to accuracy, implementation costs and regulatory compliance.
Their definitive consolidation will depend on both technological advances and the adoption of ethical and safe practices that strengthen its reliability. With the arrival of new more precise language models, such as GPT-5 or Google’s Gemini, a reduction in errors and a significant improvement in the quality of responses are expected.
In addition, an emerging trend is the combination of generative AI with traditional rule-based systems, in order to achieve an optimal balance between creativity and precision, thus ensuring more efficient and safer interactions for users.
In terms of trends for the future, specialists mention the SML models (Small Model Language), that by means of techniques such as the distillation of knowledge (knowledge distillation) allows a smaller model, known as a “student”, to be trained to mimic the behavior of a larger, more complex model, called a “master”. In this way, much of the knowledge from the master model is transferred to the student model, allowing high accuracy to be preserved, but with a considerably lower computational load. This technique is especially useful in deployments in on-premise, where hardware resources are usually limited.[11]
Finally, it is generally recommended to consult an IMF publication that addresses relevant topics on the use of AI in tax and customs administrations, including use cases detailing the most appropriate AI technology, the potential value, the maturity level, explainable AI requirement, the degree of human intervention in the process, and the risk classification. [12]
References
[1] See: https://www.gartner.com/en/newsroom/press-releases/2022-07-27-gartner-predicts-chatbots-will-become-a-primary-customer-service-channel-within-five-years
[2] Usually when learning is done with data in the cloud or in external environments.
[3] See: https://www.toolify.ai/ai-news/decoding-generative-ai-chatbots-a-comparison-with-rulebased-chatbots-2655743
[4] See: https://www.bbc.com/travel/article/20240222-air-canada-chatbot-misinformation-what-travellers-should-know
[5] See: https://www.cnnbrasil.com.br/internacional/prefeito-de-ny-defende-chatbot-que-aconselhou-empresarios-a-infringir-a-lei/?hidemenu=true
[6]See: https://evolvy.com.br/blog/caso-de-insucesso-na-implementacao-de-chatbot/
[7]See: https://www.euronews.com/next/2023/03/31/man-ends-his-life-after-an-ai-chatbot-encouraged-him-to-sacrifice-himself-to-stop-climate-
[8]These strategies were collected in specialized publications and reviewed/adapted by the author.
[9]See CIAT publication “Data governance for tax administrations. A practical guide / 2024″ available at: https://biblioteca.ciat.org/opac/book/5884
[10] See case occurred in the Netherlands – https://es.euronews.com/2021/01/13/el-escandalo-por-la-discriminacion-racial-en-las-ayudas-familiares-cerca-al-gobierno-de-ru
[11] See: https://www.sdggroup.com/es-es/insights/blog/las-nuevas-fronteras-de-la-ia-generativa-multimodalidad-y-sml
[12] See: https://www.elibrary.imf.org/downloadpdf/view/journals/005/2024/006/article-A000-en.pdf
10 total views, 10 views today