Forecast collapse of transformer-based models under squared loss in financial time series
Pierre Andreoletti (IDP)

TL;DR
This paper analyzes why transformer models underperform in financial time series forecasting, showing that increased model complexity leads to higher variance without accuracy gains, especially in degenerate regimes.
Contribution
It provides a theoretical explanation for the collapse of transformer-based forecasts in finance, highlighting the role of noise reuse and prediction variance.
Findings
Transformers exhibit higher prediction errors than linear models on most forecasting windows.
In degenerate regimes, increased model expressivity does not reduce bias but increases variance.
Numerical experiments on EUR/USD data support the theoretical variance-driven degradation mechanism.
Abstract
We study trajectory forecasting under squared loss for time series with weak conditional structure, using highly expressive prediction models. Building on the classical characterization of squared-loss risk minimization, we emphasize regimes in which the conditional expectation of future trajectories is effectively degenerate, leading to trivial Bayes-optimal predictors (flat for prices and zero for returns in standard financial settings). In this regime, increased model expressivity does not improve predictive accuracy but instead introduces spurious trajectory fluctuations around the optimal predictor. These fluctuations arise from the reuse of noise and result in increased prediction variance without any reduction in bias. This provides a process-level explanation for the degradation of Transformerbased forecasts on financial time series. We complement these theoretical results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
