EXANTE’s philosophy for fixing flaky test suites

May 5, 2026

As trading infrastructure scales, so does the burden of maintaining the automated tests designed to protect it. Vladimir Smirnov, a backend testing specialist at EXANTE, argues that the industry’s instinct to keep tests green at all costs may actually be making things worse.

Writing at least one automated test per task is mandatory at EXANTE, with engineers focusing on service tests that hit the API of a specific microservice running in a live environment alongside all its dependencies. In practice, tests rarely stop at one per feature, and as codebases grow, so do the jobs required to run them. Different teams own multiple microservices, each with its own CI/CD pipeline, and some services have grown substantial enough to warrant splitting test sets across separate jobs entirely.

The result is a steady accumulation of unstable failures, commonly called flakes, that stem from a range of causes: misconfigured environments, parallel test conflicts, race conditions, and dependencies on adjacent services. The danger, Smirnov contends, is not the flakes themselves but the habit of masking them through reruns or automatic retries. When engineers grow desensitised to low-value errors, genuinely critical failures can slip through unnoticed.

The solution EXANTE has implemented starts with a philosophical shift. Rather than treating tests as assets to be kept green, the team now views them as diagnostic tools whose primary purpose is to fail, but to fail in ways that point clearly to real problems. “Automated tests are not sacred,” Smirnov writes. “When a tool breaks, you replace it or throw it away.”

In practical terms, this means dedicating a rotating duty engineer specifically to test maintenance, entirely separate from feature development work. That engineer follows a structured triage process: failures are categorised by severity, bugs are filed, and known flakes are tagged with markers that allow them to be skipped or rerun conditionally without distracting the wider team. An automated script also tracks recurring failures that might otherwise be overlooked day to day. Quick wins are prioritised before deeper investigation begins.

For failures requiring deeper investigation, EXANTE uses a structured collaboration model: an initial solo analysis is followed by a group call to form hypotheses, then iterative testing before a final decision on remediation.

For more insights, read the full story here.

Read the daily FinTech news