Tax fraud directly and negatively affects business market conditions by creating unfair competition. Compared to competitors who do not pay taxes, companies that operate in accordance with the law have higher costs, and therefore, higher prices for products and services. Moreover, tax fraud, from the perception of the individual citizens, causes the reduction of basic human rights (local infrastructure, health care, pensions…). Tax fraud reduces tax collection and thereby reduces the level of public service quality and room for its improvement.

Detecting taxpayer fraud is the most difficult step in fiscal control. One of the primary goals of tax control is to monitor and check the financial operations of large companies/corporations as they represent the biggest risk bearers in the tax fraud field. At the same time, the tax fraud that is performed by small and micro companies is mostly done through cash transactions related to income and expenses. By using various independent sources, or by matching data and checking with other sources, tax authorities can determine taxpayers’ malicious actions. Also, the determination of taxpayers’ criminal actions requires a field tax inspection (randomly or on request), which requires a lot of time and financial resources from the tax administration (concerning the number of inspectors and additional material resources). It is a very difficult, and ultimately impossible, task for the tax administration to solely rely on field inspections for the factual verification of tax compliance. In this respect, audit work would be further eased by the application of some software tools that would help determine possible embezzlement, without violating the relationship of trust between the taxpayers and the administration itself.

TaxCore®, as a taxpayer monitoring system, records all significant elements of every single fiscal transaction, which enables tax authorities to monitor tax collection and transaction records. The platform is based on taxpayers’ identity and data protection. It is very intuitive, allowing tax officials to search taxpayers according to several parameters. This enables tax officials to easily locate and use important information, monitor trends in the taxpayer’s business, etc. This software solution provides the tax administration with notifications about each commercial transaction in a way that enables risk analysis and remote verification.

One of TaxCore’s® advantages is the unification of the tax authority’s data for all taxpayers in a country. In this way, a large important database is formed, from which not just meaningful and significant results/reports can be generated, but future predictions about potential tax frauds can also be made. The unification of all taxpayer’s transaction data is not achieved through centralization, in the sense of storing data in one place, but rather under the form of complete information. Accumulated data enables comprehensiveness because if we group the information itself, as well as its importance, it then has a greater and better application in various analyses, which in turn can be used for some future predictions.

Tax inspection efficiency could be improved by applying a new approach in which the first step would be to define basic parameters by using the DATA MINING process. The objective is to develop algorithms to detect tax violations by using advanced methods of analyzing big data and artificial intelligence with the help of machine learning. Big data are considered a new type of resource, in terms of business assets, and are used to improve business processes and increase productivity for the next period. According to the available literature, it is estimated that, in addition to the financial sector, the biggest area in which productivity will be increased, thanks to big data, is the public sector, i.e., the state administration1. The application of machine learning methods to large databases in the tax administration would provide valuable insights into the historical behavior of taxpayers. Based on this, recommendations for field inspections could be obtained. In addition to improving the effectiveness of field inspections, this approach would also enable the creation of risk categories for taxpayers. The idea is to create forms based on historical indicators of certain attributes of the taxpayer, according to which, and depending on the degree of matching with the forms, a taxpayer would be assigned a certain level of risk.

Extracting Patterns and Models

Extracting patterns of possible scenarios for tax fraud would be based on the taxpayers’ historical behavior and would work by monitoring certain (predefined) attributes and relying on some of the models from the literature (artificial neural networks, Bayesian networks, logarithmic regression…). It is necessary to categorize the risk of the taxpayer into, for instance, low/medium/high. Depending on the degree of coincidence (probability) with the defined tax fraud scenarios, the taxpayer would receive a certain categorization of importance based on the level of risk. One of the benefits of developing risk indicators for individual taxpayers is the possibility of using these indicators to rank all taxpayers according to a defined risk level. Machine learning would be used to discover patterns and relationships among attributes that are useful for identifying “suspicious” behavior by taxpayers. It would be used to select taxpayers who are suspicious and, as such, would be forwarded to inspectors for further checks. The goal of this approach is to increase the productivity of tax inspectors in the field and to regain tax revenue losses. Compared to the manual search method, this data mining technique is a more modern (scientific) approach that would spare resources and avoid personal judgements in selecting “suspicious” taxpayers.

The basic starting point for determining future potential tax frauds is to distinguish intentional fraud from an accidental mistake made by the taxpayer. The term “fraud” refers to all cases of willful non-compliance with tax regulations. Tax fraud is often depicted as a synonym for tax violation. However, tax violation includes all cases when the taxpayers have not settled their tax liability. On the other hand, tax fraud represents the taxpayer’s intention to circumvent the law to avoid paying taxes. Thus, tax frauds are a subset of a set of tax violations. The basic starting point for identifying tax fraud lies in distinguishing intentional (fraud) from accidental (mistake) irregularities. The delineation would be based on all realistic scenarios of tax violations that are based on historical data, with the main indicator of distinguishing intent from mere coincidence – this indicator would be the frequency of violation cases.

Many analytical anomalies are to be expected when starting data mining. High-risk anomalies should be distinguished from low-risk anomalies. Those anomalies that could be foreseen should not be taken into further consideration. Knowledge of the tax system, precisely through TaxCore®, will make it possible to separate normal and expected anomalies from those that could be characterized as potential tax frauds.

The first instance is the classification of the data we have in terms of the correctness of the data itself (negative amounts, empty fields, formats, duplicate values, uneven values…). The accuracy of the model depends on the correctness of the input data. The correctness of the input data largely depends on the accuracy of their input. This brings us back to the very beginning, that is, the correctness of the input information is the responsibility of the taxpayer, who uses the components for issuing invoices following The challenge is to define the initial conditions and attributes based on which taxpayers’ risky behavior scenarios would be created. Malicious models of behavior should be defined, that is, rules should be formed that are in accordance with some known or hypothetical frauds. If there is no feedback from experts, we need to form synthetic data ourselves that make up for both legal and illegal transactions. By interpreting the existing data, their interconnections, their intersection, and the application of a model to that data, the goal should be reached – whether the transaction of that taxpayer can be interpreted as fraud with certain accuracy. Moreover, it is good to apply different machine learning methods (K-nearest neighbors, decision tree classifier, artificial neural networks, logistic regression…) to determine how these methods handle the input of data as well as the accuracy of the results obtained. If the obtained results were forwarded to the tax administration, which would determine the accuracy of the calculations by going out into the field, this would provide feedback on the (in)accuracy of the method. This could be considered the only valid confirmation of the method itself. In this regard, the method of choice would determine the most accurate results.

If there is real historical data on proven tax frauds, the detection of future potential taxpayers who would commit fraud would be determined by using supervised methods. The application of the model would go in the direction of searching in the database for the transactions of all taxpayers, and it would identify those taxpayers who have similar characteristics (behavior) to those taxpayers in whom tax fraud has been proven. If there is no knowledge or available information about existing tax frauds, data mining would then be performed using unsupervised machine learning methods, although these have a lower level of precision and interpretation compared to supervised methods. With the unsupervised method, in contrast to the supervised method, not just cases of tax fraud would be identified, but economic entities that are irregular in paying tax obligations would be identified, as well as taxpayers’ suspicious behavior would be pointed out. These working methods can be used in the verification work made by tax inspectors to determine tax crime. They can also be suitable for supporting risk management decisions in case of tax fraud, which would be used to better prioritize tax controls and ensure a more efficient tax collection.

Examples from practice

The following is a brief overview of the possibilities of applying the TaxCore® solution within the current machine learning trends for predicting future potential taxpayers who would commit tax fraud. A special invoice category to pay attention to is refund invoices. According to global information, at the level of retail businesses, as much as 28% of all fraud is committed by employees through issuing refunds2. For this reason, when looking for possible tax fraud, emphasis should be placed on refunds. Special attention should be paid to employees who possess additional credentials (e.g., managers3) because they have privileges for additional discounts, coupons for subsequent purchases, and the like. Some of the possible ways to track refunds are:

    • Scenario 1: Monitoring the total number of refunds within the total number of taxpayer’s issued invoices, compared by all employees. What is the frequency of occurrences, monitor it on a daily, weekly basis… A large number of refunds in the taxpayer’s total sales, if often repeated – is a signal of alarm.
    • Scenario 2: monitoring the number of canceled items per invoice for each taxpayer and comparing it by employees. A large number of cancellations by one taxpayer in relation to sales, if often repeated – is a signal of alarm.
    • Scenario 3: Monitoring the price oscillation of an individual item where the price is increased or decreased compared to the average, which may indicate manipulation of the reported and actual selling price, as well as an illegal increase (price gouging).
    • Scenario 4: Monitoring customer reports of suspicious transactions related to a specific point of sale.

The essence of these scenarios is the repetition or frequency of the events. An event that happens never, once or an insignificant number of times in the observed period, cannot be marked as a potential tax fraud. Also, it is necessary to define several mutually independent risk categories related to different concepts (weight factors of initial conditions, frequency, taxpayers’ risk categories…). Risk assessment is a very subjective process, however, if certain methodologies and principles are applied, subjectivity can be reduced to the lowest possible level. Based on the available data in the TaxCore® database, certain rules can be identified and risk types, as well as risk acceptability levels, can be defined with expertise. Risk assessment would entail making decisions based on real data and the experience of experts. Every risk event is accompanied by its frequency. To this respect, it becomes necessary, based on the experts’ knowledge, to define frequency intervals, according to which the taxpayers’ behavior would be delimited. The essence of the problem lies in setting thresholds, both for the observed event’s risk levels, as well as for the frequency of the intervals.

TaxCore® makes it possible to track and separate the time of invoice issuance from the time of invoice reception in the database. This information is very important because by monitoring the time invoice reception, it can be determined whether there is an accumulation of invoices in a determined part of the day (e.g., end of working hours), whether there are gaps during working hours, how often are they repeated, etc.

Tax rates can also be checked through the TaxCore® system, in terms of whether taxpayers really apply the tax rates they have declared. Also, it can be determined whether, at the level of one taxpayer, several different tax rates are calculated for the same item, as well as whether there is a mixing of tax rates when issuing an invoice by a taxpayer who is issuing several tax categories.

In addition to this, trends of the following elements can be monitored at a taxpayer level for arbitrary time intervals (daily, weekly, monthly, quarterly, at an annual level, tax calendar, beginning and end of the fiscal period or any period for submitting fiscal returns, since the period immediately before is very interesting for withdrawing money and reducing turnover):

-Number of issued invoices


-Tax amount

-Max and min number of issued invoices

-Cash/card ratio for types of payment

Types of transactions and their percentages by number and amount in the total number/amount of transactions (double monitoring at the level of taxpayers and taxpayer comparison with the average trend at the level of business activity), etc.

These data can be important for monitoring the degree of deviation of the trend at one taxpayer’s level in relation to the trends at the level of their business activity, according to the selected parameter.

Example 1
One of the examples of trend monitoring according to one parameter would be to determine the deviation of the trend of the number of taxpayer invoices in relation to the trend of the average number of issued invoices at the level of their business activity. Deviation intervals should also be defined: (1) what percentage of deviations is tolerated, (2) what thresholds would require additional investigation without declaring the taxpayer as a possible fraud perpetrator and (3) what percentage of deviations, and anything higher than that, would mark the taxpayer as a potential fraud perpetrator and would alert the tax inspectors to carry out a field inspection of that taxpayer. Along with the deviation percentage, it is necessary to monitor its frequency; it must differ in weight (significance) of the frequency of deviations on a weekly basis from the frequency of deviations on a monthly level. For example, two deviations on a weekly basis do not have the same weight as two deviations monthly.

Example 2
Based on several examples from practice and their recorded turnover in TaxCore®, it was observed that, from month to month, there is a decreasing trend in the number of issued invoices. Also, on certain working days there is not a single invoice issued (there are no recorded invoices in the system). It should be noted that this is the hospitality industry, and it is common for some industries not to have daily sales. According to these examples from practice, a scenario can be defined, according to which if there is a tendency for the number of invoices to decrease and on some days, there are no sales recorded, for both conditions met, the taxpayer enters the red zone, which means that it is passed on to the tax inspectors for additional controls.

Every type of fraud leaves certain “traces” in the data. TaxCore® records events in real-time, leaving no possibility for subsequent, retroactive data changes that would “override” old data. This further implies that absolutely every change that occurs at the level of every piece of information is stored in the database. In this way, big data is created, the search and analysis of which can determine the models of future potential misdemeanor actions. Humans are not able to manually analyze large databases, to define and extract certain patterns and scenarios based on the data. Advanced machine learning methods are ideal for mining large databases and for identifying scenarios. Patterns by which machine learning methods would search the data can be defined based on real historical indicators belonging to taxpayers who have committed tax frauds or by forming fictitious tax fraud scenarios. Artificial intelligence methods can be used in both directions. The first direction is in defining and singling out taxpayers who have committed some tax violations, and the second direction is to use them as tools for committing fraud. Namely, fake data sets can be created, depending on the tax categories, which would be in turn trained with AI algorithms to achieve the desired level of matching accuracy with real data. Such a set of data, which is identical to real data, can be used for tax information that would mislead tax inspectors. In this respect, it is necessary to include advanced technologies in the taxpayer monitoring process to prevent and, if possible, control the overrepresentation of AI in tax fraud.

Given that in the current globalization and development conditions of IT technology, the number of risks increases drastically, it is necessary for the financial stability of the country to define measures and approaches for determining potential tax frauds. In this regard, adaptation to the world’s financial turmoil becomes necessary, as well as considering a modern approach to identifying tax fraud. TaxCore® due to its comprehensive application, innovative approach, technology, and theory, would enable tax authorities to manage future tax fraud risks which would result in further economic progress and in the growth of the country.


Text Author: Jelena Lukić, business analyst, Data Tech International, d.o.o.