Backtesting is an essential part of quantitative strategy development, and naturally, strategies are often selected based on strong backtest performance. However, an important question when evaluating backtested strategies is how much of the results reflects skill versus luck.
Reference [1] examines this issue by analyzing 1,726 commercially marketed strategies from ten global institutions over the period 2009 to 2025, covering equities, rates, foreign exchange, credit, and commodities. Each strategy is classified into one of seven categories: Carry, Hedging, Momentum, Multi Premia, Factor, Value, or Liquidity. The author pointed out,
This paper examines how institutional allocators should interpret marketed backtests of structured investment strategies. The analysis contributes in three ways. First, it quantifies the gap between pro-forma and live performance on a uniquely large commercial sample of 1,726 strategies from ten global institutions over 2009–2025. Second, it shows that once live performance is measured against a leave-one-out bucket-average peer benchmark, the residual information content of the marketed backtest is economically negligible: what looks like strategy-specific skill is predominantly the common factor regime prevailing at launch. Third, it identifies two structural channels—regime timing at launch and a horizon-dependent launch-density effect—that jointly explain the residual decay, and translates the result into an operational rule: the haircut applied to a marketed backtest should increase with the extremity of the pre-launch factor regime.
In summary, the results show that backtested strategies often experience significant performance decay in live trading, approximately 2% to 3% per year. Most of the backtested performance is driven by factor regimes rather than true skill, with regime timing and crowding identified as the main drivers of decay.
This has important implications for allocators and system developers, as strategies should be benchmarked against peers and adjusted for regime effects, given that backtests often reflect the environment rather than persistent alpha.
Let us know what you think in the comments below or in the discussion forum.
References
[1] Chang Liu (2026), Evaluating Structured Strategy Backtests: Peer Benchmarks, Regime Timing, and Live Performance, arXiv:2604.18821
Post Source Here: Why Backtests Decay: Regime Dependence and Crowding
source https://harbourfronts.com/backtests-decay-regime-dependence-crowding/