Testing

Eliminating Flaky Tests: A Systematic Approach

February 28, 202612 min read

TestingQAFlaky TestspytestCI/CDSelenium

Eliminating Flaky Tests: A Systematic Approach

A flaky test is a test that sometimes passes and sometimes fails without any code changes. At 10% flaky rate, developers stop trusting the test suite. At 20%, they stop running it.

I've taken suites from 10% flaky to under 1%. Here's the systematic approach.

Step 1: Measure the Flake Rate

You can't fix what you don't measure. Track flakiness over time:

# Simple flake tracker in CI
import json
from datetime import datetime

def record_test_result(test_name, passed, run_id):
    with open('test_history.jsonl', 'a') as f:
        json.dump({
            'test': test_name,
            'passed': passed,
            'run_id': run_id,
            'timestamp': datetime.utcnow().isoformat()
        }, f)
        f.write('\n')

Run this for 2 weeks. Any test that fails >2 times without code changes is flaky.

Step 2: Categorize the Flakes

In my experience, flaky tests fall into 5 categories:

Category	% of Flakes	Example
Timing/async	40%	Test checks element before it renders
Shared state	25%	Test A writes data that breaks Test B
Network	15%	External API times out
Randomness	10%	Test uses random data that triggers edge cases
Environment	10%	Different behavior on CI vs local

Step 3: Fix by Category

Timing: Use Explicit Waits, Not Sleep

# Bad: arbitrary sleep
time.sleep(3)
assert element.is_visible()

# Good: explicit wait with condition
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

element = WebDriverWait(driver, 10).until(
    EC.visibility_of_element_located((By.ID, "result"))
)

Shared State: Isolate Every Test

# Each test gets its own database transaction that rolls back
@pytest.fixture(autouse=True)
def db_session(db):
    connection = db.engine.connect()
    transaction = connection.begin()
    session = Session(bind=connection)

    yield session

    transaction.rollback()
    connection.close()

Network: Mock External Services

# Mock external APIs in tests
@pytest.fixture
def mock_market_data(mocker):
    return mocker.patch(
        'services.alpaca.get_quote',
        return_value={'price': 150.00, 'volume': 1000000}
    )

Randomness: Use Seeds

# Deterministic "random" data in tests
@pytest.fixture
def fake():
    return Faker()
    fake.seed_instance(12345)  # Same data every run

Step 4: Quarantine, Don't Delete

Don't delete flaky tests — quarantine them. They still catch real bugs sometimes:

@pytest.mark.flaky(reruns=3, reruns_delay=2)
def test_websocket_reconnection():
    # This test is flaky due to WebSocket timing
    # Reruns 3 times with 2-second delay between attempts
    ...

Track quarantined tests separately. Fix them when you have time. But don't let them block deployments.

Step 5: Prevent New Flakes

Add a CI check that detects new flaky tests:

- name: Detect flaky tests
  run: |
    # Run the test suite 3 times
    for i in 1 2 3; do
      pytest tests/ --tb=line -q > results_$i.txt 2>&1 || true
    done
    # Compare results - any test that passed in one run
    # but failed in another is flaky
    python scripts/detect_flakes.py results_*.txt

Results

Metric	Before	After
Flaky rate	10%	0.8%
CI pass rate	72%	97%
Developer trust	"CI is broken again"	"If CI fails, there's a real bug"
Time to fix a flake	2-4 hours	15 minutes (categorized approach)

The biggest win isn't the number — it's developer trust. When engineers trust the test suite, they run it. When they run it, they catch bugs before production.

Want to see this in action?

Check out the projects and case studies behind these articles.

View Projects Read Case Studies