Saturday, April 25, 2026

Why Backtests Decay: Regime Dependence and Crowding

Backtesting is an essential part of quantitative strategy development, and naturally, strategies are often selected based on strong backtest performance. However, an important question when evaluating backtested strategies is how much of the results reflects skill versus luck.

Reference [1] examines this issue by analyzing 1,726 commercially marketed strategies from ten global institutions over the period 2009 to 2025, covering equities, rates, foreign exchange, credit, and commodities. Each strategy is classified into one of seven categories: Carry, Hedging, Momentum, Multi Premia, Factor, Value, or Liquidity. The author pointed out,

This paper examines how institutional allocators should interpret marketed backtests of structured investment strategies. The analysis contributes in three ways. First, it quantifies the gap between pro-forma and live performance on a uniquely large commercial sample of 1,726 strategies from ten global institutions over 2009–2025. Second, it shows that once live performance is measured against a leave-one-out bucket-average peer benchmark, the residual information content of the marketed backtest is economically negligible: what looks like strategy-specific skill is predominantly the common factor regime prevailing at launch. Third, it identifies two structural channels—regime timing at launch and a horizon-dependent launch-density effect—that jointly explain the residual decay, and translates the result into an operational rule: the haircut applied to a marketed backtest should increase with the extremity of the pre-launch factor regime.

In summary, the results show that backtested strategies often experience significant performance decay in live trading, approximately 2% to 3% per year. Most of the backtested performance is driven by factor regimes rather than true skill, with regime timing and crowding identified as the main drivers of decay.

This has important implications for allocators and system developers, as strategies should be benchmarked against peers and adjusted for regime effects, given that backtests often reflect the environment rather than persistent alpha.

Let us know what you think in the comments below or in the discussion forum.

References

[1] Chang Liu (2026), Evaluating Structured Strategy Backtests: Peer Benchmarks, Regime Timing, and Live Performance, arXiv:2604.18821

Post Source Here: Why Backtests Decay: Regime Dependence and Crowding



source https://harbourfronts.com/backtests-decay-regime-dependence-crowding/

Tuesday, April 21, 2026

Volatility Risk Premium Dynamics Through the Heston Framework

A significant amount of research has been conducted on the volatility risk premium (VRP). Reference [1] contributes to this literature by linking the VRP to the parameters of the Heston model. The Heston model is a widely used stochastic volatility model that captures time-varying volatility and mean-reverting dynamics.

Unlike previous studies, the authors do not rely on a theoretical model to estimate the VRP, but instead use returns from variance-related financial instruments, including variance swaps, VIX futures, and straddles, as proxies, examined over 7- and 30-day horizons. They pointed out,

The economic magnitude is substantial. A one-standard-deviation increase in v0 is associated with approximately 730 basis points lower next-day returns for the 7-day variance swap. This confirms that the current level of market variance is the primary driver of near-term VRP: when variance is elevated, the compensation demanded by investors for bearing variance risk increases, depressing expected returns on long-volatility positions…

This pattern supports Hypothesis 2: greater uncertainty about future variance dynamics leads to larger risk premia and more negative expected returns. The slightly weaker significance for 30-day straddles may reflect the attenuation of uncertainty effects at longer horizons, where the convex payoff structure provides some natural hedging against variance fluctuations…

At the 30-day horizon, the significance of κ diminishes substantially. For variance swaps and straddles, coefficients become statistically indistinguishable from zero. Only for VIX futures does κ retain marginal significance. This differential pattern across maturities—significance at 7 days but not at 30 days—is precisely what Hypothesis 3 predicts and provides validation of the underlying economic mechanism.

In summary, the results show that the initial variance level is a strong negative predictor across all cases, volatility of volatility is also negative and robust, mean reversion is relevant only in the short term, and long-run parameters are largely irrelevant.

These findings suggest that the VRP is primarily driven by current volatility levels and uncertainty rather than long-term factors, providing useful insights for portfolio and risk management.

Let us know what you think in the comments below or in the discussion forum.

References

[1] Han, C.-H., & Wang, K. (2026), Variance Risk Premia under Volatility Models, Review of Quantitative Finance and Accounting.

Post Source Here: Volatility Risk Premium Dynamics Through the Heston Framework



source https://harbourfronts.com/volatility-risk-premium-dynamics-heston-framework/

Saturday, April 18, 2026

Multifractal Analysis of Herding and Inefficiency in Precious Metals

Gold and silver have been strong recently, with upward trends and increased volatility. Reference [1] studies the price dynamics of leading precious metals, gold, silver, platinum, and palladium, over the period from December 9, 2019, to March 1, 2026, dividing the sample into pre- and post-COVID subperiods. The study employs Multifractal Detrended Fluctuation Analysis (MFDFA) to estimate the generalized Hurst exponent, the magnitude of long memory, and return predictability.

The author pointed out,

The results based on the GHE estimates reveal that the selected precious metals exhibited varying degrees of multifractal properties throughout the sample period. The intensities of multifractal structures differ from coronavirus to its aftermath, but the trend is less pronounced for the latter phases of COVID-19 pandemic, except for gold. Additionally, the same attitude is not relevant for market efficiency where the herd-driven behavior strictly decreased for silver, platinum, and palladium after the outbreak but the market became increasingly inefficient for each asset. Therefore, the degree of herding behavior and market inefficiency differed markedly between the two sub-periods. It is noteworthy that changes in the average values of the multifractal spectrum indicate an increase in multifractality only for gold during the post-pandemic period. This means that the latter phase appears to have intensified herding behavior and market inefficiency in gold transactions. However, the MLM-based inefficiency index indicated that all selected assets became less efficient in the aftermath of COVID-19 pandemic. Within this framework, following the coronavirus disease, the returns of silver, platinum, and palladium - excluding gold – became increasingly predictable, indirectly suggesting reduced volatility.

…In line with the multifractal spectrum results, the evaluation of post-COVID-19 changes in herding behavior through the fractal dimension scale indicates that, during the given period in the precious metals market, herding increased only in gold, while the rest of the metal assets exhibited a decline in herding tendencies…In addition, the rise in the inefficiency index for all precious metals between the two periods indicates that they were exposed to wider market-wide effects related to transaction volumes.

A key contribution of the paper is the separation of herding behavior, measured by the Hurst exponent, from market inefficiency, measured by an inefficiency index, along with the construction of a predictability index.

The results show that gold exhibits stronger herding post-COVID, while other metals show weaker herding, although all markets become more inefficient. In terms of predictability, silver, platinum, and palladium become more predictable post-COVID, whereas gold does not.

This is a noteworthy contribution, showing that the precious metal market does not follow a random walk, that structure has changed after COVID, and that gold behaves differently from other metals, providing useful insights for portfolio and risk management.

Let us know what you think in the comments below or in the discussion forum.

References

[1] Özdemir, O. (2026). Herding, Market Efficiency, and the Melting Pot Effect: Evidence from the Precious Metals Market. Preprints.org.

Article Source Here: Multifractal Analysis of Herding and Inefficiency in Precious Metals



source https://harbourfronts.com/multifractal-analysis-herding-inefficiency-precious-metals/

Wednesday, April 15, 2026

Overnight vs Daytime Returns in Sector ETFs

There is a noteworthy line of research that decomposes asset or strategy returns into daytime and overnight components. This type of decomposition has been discussed previously in the context of the volatility risk premium.

Reference [1] follows a similar approach, examining SPY and nine sector ETFs over the period 1999 to 2025. The study tests 24 simple strategies based on static long/short positions, momentum (“inertia”), and reversal rules, applied separately to daytime and overnight returns. The authors pointed out,

The results provide compelling evidence for the hypothesis that overnight periods generate stronger exploitable momentum than daytime periods. Strategy #1 (Long/Cash), which captures pure overnight returns Ri = RCOi, consistently outperformed across all ten ETFs with final values ranging from $435 (XLP) to $3165 (XLK). In contrast, Strategy #3 (Cash/Long), capturing pure daytime returns Ri = ROCi, generated losses in 8 out of 10 ETFs. This stark asymmetry contradicts the efficient market hypothesis and supports behavioral finance theories, which suggest reduced arbitrage activity during non-trading hours…

Conversely, the systematic failure of Strategy #2 (Short/Cash: Ri = −RCOi) across all ETFs ($2–$18 final values) demonstrates that overnight movements exhibit persistent positive drift rather than random walk behavior. If overnight returns were symmetrically distributed, short and long strategies would show comparable absolute performance, which the data clearly refute…

The enormous outperformance documented for Strategy #18 throughout this paper is therefore attributable to the temporal decomposition of the 24 h period into distinct overnight and daytime sub-periods, rather than to the specific direction of the conditioning signal. The structural asymmetry between overnight returns (persistent positive drift, low volatility, fat tails) and daytime returns (near-zero drift, higher volatility) is the economic substrate that makes sub-period strategies profitable…

In summary, the results show that overnight returns exhibit a persistent positive drift, while daytime returns are significantly weaker, and the best-performing strategies are those that maintain long exposure overnight. A simple overnight-only long strategy also performs well and often outperforms buy-and-hold before transaction costs. Also, autocorrelation analysis suggests that the edge lies in sub-period decomposition rather than long-memory forecasting.

Note, however, that after [glossary_exclude]accounting [/glossary_exclude]for transaction costs, returns decline, making these strategies mainly feasible for managers with low execution costs.

Let us know what you think in the comments below or in the discussion forum.

References

[1] Salotra, G., Katikireddy, T., Anumolu, Y., & Pinsky, E., A Comparative Analysis of Overnight vs. Daytime Static and Momentum Strategies Across Sector ETFs, Risks, 2026, 14, 84.

Originally Published Here: Overnight vs Daytime Returns in Sector ETFs



source https://harbourfronts.com/overnight-vs-daytime-returns-sector-etfs/

Sunday, April 12, 2026

Regime Classification Framework for Mean-Reverting and Trending Markets

Regime classification is important in asset and risk management. Traditional approaches classify regimes based on direction, bullish or bearish, and volatility, high or low. Reference [1] departs from this framework and instead classifies markets as mean-reverting or trending. Specifically, it uses return thresholds of 0.5%, 0.75%, and 1% to define regimes and examines SPY, QQQ, DIA, and IWM over the period 2000 to 2024.

The study incorporates technical indicators such as VIX, RSI, and ATR, along with market data, including returns, range, and volume, and macro events such as CPI releases, employment data, and FOMC meetings and projections. Three models are evaluated: Random Forest, Neural Network (MLP), and XGBoost. Validation is conducted using a rolling window framework with an expanding training set and one-step-ahead testing, including a 252-day evaluation period in 2024.

The author pointed out,

The results demonstrate consistent improvements over baseline classifiers, though performance varies meaningfully across ETFs and thresholds. For SPY (S&P 500), Neural Networks achieved a 15.4% improvement over the naive classifier at the 0.5% threshold, with AUC values reaching 0.67–0.74 at the 0.75% and 1% thresholds—the strongest and most statistically robust results in the study (bootstrap 95% CIs well above 0.5 at both thresholds). For IWM (Russell 2000), improvements ranged from 5.7% to 13.4% across thresholds, with Neural Network AUC values of 0.59 and 0.55 at the 0.5% and 0.75% thresholds (bootstrap CIs excluding 0.5), indicating genuine predictive power for small-cap market regime classification …

For QQQ (Nasdaq), the Neural Network achieved 4.7–6.1% improvement over the naive classifier, with AUC reaching 0.62 at the 1% threshold (bootstrap CI: [0.55, 0.69]). Performance at lower thresholds was weaker and the 0.5% threshold AUC CI marginally includes 0.5; practitioners should treat QQQ predictions at the 0.5% threshold with caution. These findings underscore that predicting oscillatory behavior in highly volatile, technology-concentrated indices is more challenging than in diversified large-cap indices, and that the choice of oscillation threshold materially affects the reliability of model predictions…

In summary, the results show that the best case achieves a 15.4% improvement in prediction over a naive strategy for SPY using a neural network with a 0.5% threshold; although in many cases the improvement is more modest, in the range of 1 to 5%, and varies significantly across ETFs.

While the study has several limitations, it points to a more relevant research direction: predicting the magnitude-based regime appears slightly easier than predicting direction, and machine learning is effective as a risk or regime filter rather than as a direct alpha-generating signal.

Let us know what you think in the comments below or in the discussion forum.

References

[1] Azizi, S. (2026). Leveraging Machine Learning for Financial Forecasting: Distinguishing Market Trends from Oscillations in ETFs. Journal of Risk and Financial Management, 19(4), 262.

Article Source Here: Regime Classification Framework for Mean-Reverting and Trending Markets



source https://harbourfronts.com/regime-classification-framework-mean-reverting-trending-markets/

Tuesday, April 7, 2026

Variational Autoencoders in Volatility and Option Pricing

The Black–Scholes–Merton model is a groundbreaking and foundational framework in option pricing; however, it has well-known limitations. Several extensions have been developed to address these issues, including stochastic volatility and Lévy process-based models, which are largely parametric.

Reference [1] proposes a semi-parametric approach to overcome these limitations. Specifically, the model consists of three components:

  1. Variational Autoencoder (VAE) to generate return distributions, capturing non-normal features such as skewness and fat tails, using a tail-weighted loss to better represent extreme events and generate realistic synthetic return paths,
  2. An implied volatility model based on LightGBM, which predicts volatility using option characteristics and market sentiment, capturing the volatility surface more effectively than constant volatility assumptions, and
  3. Pricing framework using Multi-Level Monte Carlo, which simulates paths based on VAE-generated returns and predicted volatility, achieving similar accuracy with lower computational cost than standard Monte Carlo methods.

The authors pointed out,

A VAE model was trained on historical log-returns of the NIFTY50 index to learn a latent representation of returns. This approach allowed us to capture complex, non-Gaussian structures in the return distribution and to generate synthetic return paths that reflect both normal and extreme market behavior. A quantile-based sampling strategy was adopted during data splitting to ensure the preservation of rare tail events in both training and validation sets. This made the model more resilient to imbalanced data and improved its ability to learn representative latent dynamics. Additionally, to estimate implied volatility, we utilized a LightGBM regression model trained on a diverse set of features, including option-specific variables such as strike price, moneyness, time to expiry and market-level sentiment indicators. The synthetic samples generated by the VAE were then used as inputs to a two-level MLMC simulation framework to estimate option prices more efficiently than conventional Monte Carlo methods. The MLMC approach reduced computational time significantly, while maintaining high fidelity in price estimation.

Empirical evaluation of our full pricing framework demonstrated that the proposed pipeline outperformed the classical Black-Scholes model across a wide range of market conditions. This improvement was consistent across both call and put options, as well as for different moneyness criteria. In particular, the model showed a clear edge in scenarios involving longer time to expiry.

In short, VAE was trained to capture non-Gaussian features and generate realistic synthetic return paths. Implied volatility was estimated using a LightGBM model with option features and sentiment inputs, and the resulting outputs were fed into a two-level MLMC framework for efficient option pricing. The approach reduced computational cost while maintaining accuracy and consistently outperformed the BSM model across market conditions, particularly for longer maturities.

This contribution is valuable and can be extended in several directions, for example, by applying the Variational Autoencoder framework to stress testing. However, the approach has limitations, including the lack of enforcement of no-arbitrage conditions.

Let us know what you think in the comments below or in the discussion forum.

References

[1] Sapna, S., & Mohan, B. R. (2026). Variational autoencoders for option pricing: A semi-parametric approach to eliminating traditional assumptions. Expert Systems With Applications, 321, 132216.

Post Source Here: Variational Autoencoders in Volatility and Option Pricing



source https://harbourfronts.com/variational-autoencoders-volatility-option-pricing/