Why 99% AI accuracy can mislead compliance

February 27, 2026

AI is rapidly reshaping supervisory controls across financial services, with 94% of firms either deploying or planning to deploy AI-based detection tools. From communications surveillance to misconduct monitoring, AI is now positioned as a cornerstone of modern compliance frameworks.

Yet behind the bold marketing claims—particularly around dramatic reductions in false positives—lies a more complex reality that firms cannot afford to ignore, said Theta Lake.

According to a report by Financial Industry Regulatory Authority on AI applications in the securities industry, AI systems can ingest and analyse vast volumes of structured and unstructured data, including text, voice, images and video, sourced internally and externally. This expanded capability enables firms to identify behavioural patterns and anomalies at scale, offering more holistic and risk-based supervision across the enterprise. In theory, such technology promises sharper detection and greater operational efficiency.

A central selling point of these tools is their purported ability to slash false positives (FPs), thereby freeing compliance teams to focus on genuinely suspicious activity. It is therefore unsurprising that RegTech vendors frequently compete on headline reduction rates. However, Rohit Jain, distinguished engineer at Theta Lake, argues that firms should approach these claims with caution. Drawing on more than two decades of experience in machine learning, he highlights the structural limitations that make false positives both persistent and, to some degree, inevitable.

In compliance monitoring—whether for insider trading, collusion or workplace misconduct—AI systems are effectively searching for a handful of problematic interactions buried within millions of routine communications. A false positive arises when a model flags benign behaviour as suspicious. In text-based systems, linguistic ambiguity, sarcasm and industry jargon can all trigger incorrect alerts. The result is a digital equivalent of “crying wolf”.

The operational consequences are significant. Each flagged item demands human review. If analysts spend just 10 minutes investigating a single false alert and the system generates 100 FPs daily, nearly 17 hours of productivity are lost every day. Over time, this becomes a structural drain on compliance resources.

More concerning is the psychological effect. When reviewers repeatedly encounter incorrect alerts, they become desensitised. Fatigue and repetition encourage faster, less critical scanning. In this environment, the likelihood of dismissing a genuine case of misconduct increases. The system designed to reduce risk may, paradoxically, introduce new vulnerabilities.

At the heart of the problem lies the “base rate” issue. In corporate communications, actual misconduct may represent as little as 0.01% of all messages. Even a model boasting 99% accuracy can generate overwhelming volumes of false positives when scanning millions of emails. For example, in a dataset of 1,000,000 emails with just 10 genuine fraud cases, a 1% error rate could still produce 10,000 false alarms.

This is why accuracy alone is a dangerously misleading metric. In low base-rate environments, a model can achieve near-perfect accuracy simply by classifying everything as normal. Such performance looks impressive on paper but fails to solve the real-world problem of detecting rare but critical events. In this context, “accuracy” becomes a vanity metric that obscures weaknesses in identifying the very cases that matter most.

The trade-off is unavoidable. Tightening model thresholds to reduce false positives risks missing genuine misconduct. Loosening thresholds to capture every potential issue inevitably increases noise. There is no perfect configuration—only a balance aligned with a firm’s risk appetite and operational capacity.

When vendors claim 99% accuracy or dramatic false positive reductions, firms must scrutinise how those metrics were calculated, what datasets were used and whether base rates reflect real-world conditions. In part two, we will examine practical techniques for reducing false positives and the critical questions compliance teams should pose before investing in AI-enabled supervision.

Find more on RegTech Analyst.

Read the daily FinTech news