Skip to main content
Back to Blog
Trading

Building a Backtesting Engine That Doesn't Lie to You

March 28, 202612 min read
PythonBacktestingTradingQuantitativeRisk Management

Building a Backtesting Engine That Doesn't Lie to You

Every quantitative trader has had this experience: backtest shows 200% annual returns. Live trading shows -15%.

The problem is almost never the strategy. It's the backtest. Most backtesting engines lie through optimistic assumptions.

The 5 Lies Most Backtests Tell

Lie 1: Perfect Fills

Most engines assume your order fills at the exact price you see. In reality:

  • Market orders fill at the ask (buying) or bid (selling), not the mid-price
  • Large orders move the market (slippage)
  • During volatility, fills can be 5-10 ticks worse than expected

My engine models this:

def simulate_fill(order, market_data):
    spread = market_data.ask - market_data.bid
    slippage = spread * 0.5  # Conservative: half the spread

    if order.side == 'BUY':
        fill_price = market_data.ask + slippage
    else:
        fill_price = market_data.bid - slippage

    return fill_price

Lie 2: Unlimited Liquidity

Your backtest buys 10,000 shares instantly. In reality, that order takes minutes to fill and the price moves against you.

I cap position sizes relative to average volume:

max_position = daily_avg_volume * 0.01  # Never more than 1% of daily volume

Lie 3: No Transaction Costs

Commissions, exchange fees, SEC fees, and financing costs add up fast. On ES futures, round-trip costs are ~$4.50 per contract. On 100 trades/day, that's $450 in friction.

Lie 4: Look-Ahead Bias

The most dangerous lie. If your indicators use tomorrow's data to make today's decision, your backtest will look incredible and your live trading will be random.

I enforce strict temporal ordering: every signal at time T uses only data from T-1 and earlier.

Lie 5: Survivorship Bias

If you're testing stock strategies, you're probably testing on stocks that survived to today. The ones that went bankrupt aren't in your dataset. This inflates returns.

The Engine Architecture

class BacktestEngine:
    def __init__(self, strategy, data, config):
        self.strategy = strategy
        self.data = data
        self.broker = SimulatedBroker(config)
        self.portfolio = Portfolio(config.initial_capital)

    def run(self):
        for timestamp, bar in self.data.iterrows():
            # 1. Update portfolio with fills from previous bar
            self.broker.process_fills(bar)

            # 2. Strategy generates signals using PREVIOUS bar data
            signal = self.strategy.on_bar(
                bar=self.data.loc[:timestamp].iloc[:-1],  # Exclude current bar
                portfolio=self.portfolio
            )

            # 3. Convert signals to orders with position sizing
            if signal:
                order = self.risk_manager.size_order(
                    signal, self.portfolio, bar
                )
                self.broker.submit(order)

            # 4. Record state for analysis
            self.portfolio.record_snapshot(timestamp)

Key design decisions:

  • Event-driven, not vectorized: Each bar is processed sequentially. Slower, but guarantees temporal correctness.
  • Strategy only sees past data: The iloc[:-1] ensures no look-ahead.
  • Broker simulates realistic fills: Slippage, commissions, partial fills.

Metrics That Matter

I report these metrics using quantstats and empyrical:

Metric What It Tells You Red Flag Threshold
Sharpe Ratio Risk-adjusted return Below 1.0
Max Drawdown Worst peak-to-trough Above 25%
Win Rate % of winning trades Below 40%
Profit Factor Gross profit / Gross loss Below 1.5
Expectancy Average $ per trade Below 0 (obviously)
Recovery Factor Net profit / Max drawdown Below 3.0

If your Sharpe is above 3.0 in a backtest, you're probably overfitting. Real-world Sharpes for systematic strategies are typically 0.8-2.0.

Walk-Forward Optimization

I never optimize parameters on the full dataset. Instead:

  1. Train on 2019-2021
  2. Validate on 2022
  3. Test on 2023
  4. Re-train on 2020-2022
  5. Validate on 2023
  6. Test on 2024

This walk-forward approach ensures the strategy generalizes to unseen data. If it only works on one specific period, it's curve-fit.

The Bottom Line

A good backtesting engine is one that makes your strategies look worse than they are. If your backtest results are conservative and your live trading beats them, you've built a trustworthy system.

Want to see this in action?

Check out the projects and case studies behind these articles.