Skip to main content
Back to Blog
Testing

Eliminating Flaky Tests: A Systematic Approach

February 28, 202612 min read
TestingQAFlaky TestspytestCI/CDSelenium

Eliminating Flaky Tests: A Systematic Approach

A flaky test is a test that sometimes passes and sometimes fails without any code changes. At 10% flaky rate, developers stop trusting the test suite. At 20%, they stop running it.

I've taken suites from 10% flaky to under 1%. Here's the systematic approach.

Step 1: Measure the Flake Rate

You can't fix what you don't measure. Track flakiness over time:

# Simple flake tracker in CI
import json
from datetime import datetime

def record_test_result(test_name, passed, run_id):
    with open('test_history.jsonl', 'a') as f:
        json.dump({
            'test': test_name,
            'passed': passed,
            'run_id': run_id,
            'timestamp': datetime.utcnow().isoformat()
        }, f)
        f.write('\n')

Run this for 2 weeks. Any test that fails >2 times without code changes is flaky.

Step 2: Categorize the Flakes

In my experience, flaky tests fall into 5 categories:

Category % of Flakes Example
Timing/async 40% Test checks element before it renders
Shared state 25% Test A writes data that breaks Test B
Network 15% External API times out
Randomness 10% Test uses random data that triggers edge cases
Environment 10% Different behavior on CI vs local

Step 3: Fix by Category

Timing: Use Explicit Waits, Not Sleep

# Bad: arbitrary sleep
time.sleep(3)
assert element.is_visible()

# Good: explicit wait with condition
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

element = WebDriverWait(driver, 10).until(
    EC.visibility_of_element_located((By.ID, "result"))
)

Shared State: Isolate Every Test

# Each test gets its own database transaction that rolls back
@pytest.fixture(autouse=True)
def db_session(db):
    connection = db.engine.connect()
    transaction = connection.begin()
    session = Session(bind=connection)

    yield session

    transaction.rollback()
    connection.close()

Network: Mock External Services

# Mock external APIs in tests
@pytest.fixture
def mock_market_data(mocker):
    return mocker.patch(
        'services.alpaca.get_quote',
        return_value={'price': 150.00, 'volume': 1000000}
    )

Randomness: Use Seeds

# Deterministic "random" data in tests
@pytest.fixture
def fake():
    return Faker()
    fake.seed_instance(12345)  # Same data every run

Step 4: Quarantine, Don't Delete

Don't delete flaky tests — quarantine them. They still catch real bugs sometimes:

@pytest.mark.flaky(reruns=3, reruns_delay=2)
def test_websocket_reconnection():
    # This test is flaky due to WebSocket timing
    # Reruns 3 times with 2-second delay between attempts
    ...

Track quarantined tests separately. Fix them when you have time. But don't let them block deployments.

Step 5: Prevent New Flakes

Add a CI check that detects new flaky tests:

- name: Detect flaky tests
  run: |
    # Run the test suite 3 times
    for i in 1 2 3; do
      pytest tests/ --tb=line -q > results_$i.txt 2>&1 || true
    done
    # Compare results - any test that passed in one run
    # but failed in another is flaky
    python scripts/detect_flakes.py results_*.txt

Results

Metric Before After
Flaky rate 10% 0.8%
CI pass rate 72% 97%
Developer trust "CI is broken again" "If CI fails, there's a real bug"
Time to fix a flake 2-4 hours 15 minutes (categorized approach)

The biggest win isn't the number — it's developer trust. When engineers trust the test suite, they run it. When they run it, they catch bugs before production.

Want to see this in action?

Check out the projects and case studies behind these articles.