AI and Regulation: How to Explain the ‘Black Box’ to Tax Authorities

AI and Regulation: How to Explain the ‘Black Box’ to Tax Authorities

According to a market study on Explainable Artificial Intelligence (XAI) published by MarketsandMarkets, its market size may reach $6.2 billion by the end of 2023. Experts predict that this figure will rise to $16.2 billion by 2028.

The anticipated increase in demand for XAI solutions is well correlated with the significant controversies surrounding AI used by regulators, similar to the situation that occurred in the Netherlands. We spoke with Feride Osmanova, the back-end developer at LOVAT, about strategies to address the “black box” issue in regulatory AI.

As a reminder, the Netherlands became one of the first EU countries to automate most tax collection procedures using AI in 2004. In 2021, a major scandal emerged when it was revealed that neural networks mistakenly deducted previously awarded tax benefits for children from 35,000 parents.

Feride Osmanova is an informed insider in the field of tax accounting automation and business processes. She has executed several large-scale projects in LegalTech for LOVAT, an international technology company headquartered in the UK. 

Notably, Osmanova developed and implemented key elements and integrations on the back end of LOVAT’s specialised cloud platform for tax compliance. The platform utilises AI to automate tax reporting and other related processes.

Feride, what does the “black box” problem represent in the context of AI utilised by regulatory authorities?

Countries such as the USA, India, Italy, and many other European nations are actively employing AI-based software for tax collection, identifying tax fraud, preparing reports, and processing documentation.

In the regulatory sphere, there are some intriguing projects tackling complex issues. For instance, British tax authorities use the AI system Connect, which scrapes information about the private lives of individuals under scrutiny, including social media activity, flight records, and data from tax returns. When there is a risk of tax misconduct, investigations are initiated based on this information. It is known that 90% of these risk-oriented investigations currently begin with insights generated by AI.

However, the application of Explainable AI in tax dispute resolution remains at a prototype stage in some areas and is not yet widely adopted. It is often quite challenging to clearly articulate why a model, which continuously learns through machine learning algorithms and utilises a multi-architecture framework of neural networks, reaches certain conclusions. This poses difficulties in judicial proceedings concerning penalties, especially when attempting to validate the actions of AI that has imposed sanctions on a taxpayer.

For example, it is relatively straightforward to explain why the marginal increase of certain variables in a linear regression model led to a specific outcome. It’s fairly easy to explain why a simple model like linear regression produced a result, but much harder with advanced techniques. Neural networks, for example, create their own internal rules as they process information, which makes their decision-making difficult to follow.

New outcomes may vary, much like human reasoning, meaning that a re-evaluation of the same data may not align with previous analyses conducted by the neural network. This encapsulates the essence of the “black box” phenomenon: we obtain results from AI, but explaining the rationale behind those results gives rise to a series of legal, ethical, and technical dilemmas.

Fortunately, several methods grounded in mathematical and statistical research have emerged to help elucidate these results. I am referring to LIME (Local Interpretable Model-agnostic Explanation) and SHAP (SHapley Additive exPlanations). The former was developed by Marco Ribeiro in 2016, while the latter was introduced by Lundberg and Su-Iny in 2017. The practice of forensic analysis in tax disputes related to AI decisions largely relies on these methods or their combinations.

Have you ever had projects that required “transparency” in AI decision-making? How did you solve the problem?

Of course, we have encountered this. For LOVAT, I developed a VAT reporting automation system in Python and using other technologies. The US and EU tax laws are very complex, and the benefit from implementing our system was up to 40% of the time saved on working with tax documentation.

We are talking about the danger of AI errors, but with our system, which included predictive neural network analysis and other elements of AI and machine learning, the number of errors decreased by an average of 70% compared to working in the old way. Can you imagine what kind of savings this is for businesses? In fact, we completely automated tax compliance. I think that despite unpleasant incidents like what happened in the Netherlands, the progress of AI in regulatory bodies cannot be stopped. Without it, it is simply impossible to complete the increasing volume of work and service the increasingly complex and changing rules.

There is always a risk of errors when we talk about neural networks, but the development is not worth it. Tools such as LIME and SHAP now come with ready-made software packages, making them easier to use in real projects. Developers can plug them into tax or compliance systems to help explain how AI models reach their decisions. In classic data analysis frameworks for Python, such as Pandas, there are methods designed to add Explainable AI functionality.

You have already partially answered this question, but still: what is more in the use of AI for tax regulation: pros or cons?

Of course, there are more pros. In the Netherlands, for example, out of 12,000 tax complaints, 80% are processed by large language models using machine learning algorithms and natural language analysis (NLP). This saves an astronomical amount of man-hours. Yes, mistakes happen, and, due to the “black box” problem, on the one hand, and the high speed of neural networks, the damage can be enormous.

It manages to do a lot before the error is detected as a result of feedback from taxpayers. But the progressiveness of neural networks, in general, does not cancel such damage. In courts and tax regulators, AI can be used not only for automatic analysis of tax returns. There are prototypes of software for analyzing big data on financial statements to identify tax evasion schemes, auto-generation of explanations for decisions concerning refusals and tax deductions, support for forensic examinations – the court can now ask for an explanation of the work of AI in controversial cases. Such practice exists.

Are states legally obliged to disclose how AI works if they use it for regulatory and control purposes?

Absolutely. In some regions, such legislation has not yet been developed, but in the EU, for example, there is the all-powerful GDPR, which protects citizens in the age of digitalization. Explanations of AI work are directly related to Satya No. 12, which guarantees taxpayers the right to an explanation of decisions made by government agencies, including those using digital tools. 

Article No. 22 guarantees the right of taxpayers to receive information about the operation of AI systems and to intervene in this matter. Finally, Article No. 6 ensures the right to protection and requires equal rights in information about how the tax system works.

Which of the existing approaches is better SHAP or LIME?

They are usually used together. I would say that they complement each other. If we roughly explain the essence of LIME, then we can consider the method using the example of the text. Let’s say a large language model tokenizes, that is, breaks down text into semantic units to understand its meaning.

Let’s say it’s important for the language model to determine the mood of the text or message, and the lion’s share of such semantics is hidden in the “terrible” layer used by a person in a chat. LIME removes words one by one and studies how the result changes. Since it is more difficult to understand that a comment has a pronounced, negative connotation without the word “terrible”, the neural network’s assessment should also become neutral. It is in such experiments that LIME tries to make the actions of tax AI “transparent”.

SHAP approaches the assessment from the side of mathematical methods and applies some methods from game theory to the neural network to assess its adequacy.

What other approaches to solving the problem exist, or are there only two methods?

In fact, improvements in the “black box” situation, when the learning ability and increasing independence of AI conflict with the stability of legal requirements, can come from many different directions. I know that Anthropic, one of the most significant developers of “smart” software, is currently experimenting with neural network architectures, including a multilayer perceptron, which most consistently imitates the human brain.

The goal of the experiments is to achieve greater stability of work results, more accurate adherence to incoming requirements embedded in the electric brain during training. Explainable AI platforms are emerging, for example, Explain Low, which help check results in the field of tax automation not only for large companies that have the expertise and funds for expensive tools, but also for medium and small organizations. Even now, a department is not always needed. One specialist is enough, for example, one like me, who can build an Explainable AI infrastructure.

This will preserve the huge gain in time and resources that AI brings. At the same time, the risks of neural network errors, which probably outweigh the risks of errors by an ordinary human specialist, are becoming smaller. I will add that great hopes are associated with increasing the computing power of neural network architectures.

AI accuracy is limited by computing power. Today’s largest neural networks have about 20 billion artificial ‘neurons,’ while the human brain has over 80 billion. This gap shows why AI can be powerful, but still not as reliable as human reasoning.

In the United States and China, computers are being developed based on new physical principles: quantum, optical, and even biological. A lot of equipment is being produced in China to overclock traditional computer technology. Sooner or later, AI will become more powerful. Its work in the regulatory industry will also become more accurate.

In 2021, the results of the 4-year DARPA program, a large-scale research project to study the situation with XAI, were summed up. A number of conclusions were drawn from a review of hundreds of expert papers on the topic: users prefer AI results with explanations rather than just results; if explanations are constructed incorrectly, the cognitive load on the user can significantly reduce the effectiveness of AI; the demand for explanations increases in direct proportion to the complexity of the task being solved.

Osmanova and other experts in regulation and AI are at the forefront of work that allows us to better understand what is happening in the “black box.” We can no longer do without AI results, but we do not yet fully control how they are created.

Photo by Immo Wegmann on Unsplash

Owais takes care of Hackread’s social media from the very first day. At the same time He is pursuing for chartered accountancy and doing part time freelance writing.
Total
0
Shares
Related Posts