Forecast collapse of transformer-based models under squared loss in financial time series

Pierre Andreoletti (IDP)

arXiv:2604.00064·stat.ML·April 2, 2026

Forecast collapse of transformer-based models under squared loss in financial time series

Pierre Andreoletti (IDP)

PDF

TL;DR

This paper analyzes why transformer models underperform in financial time series forecasting, showing that increased model complexity leads to higher variance without accuracy gains, especially in degenerate regimes.

Contribution

It provides a theoretical explanation for the collapse of transformer-based forecasts in finance, highlighting the role of noise reuse and prediction variance.

Findings

01

Transformers exhibit higher prediction errors than linear models on most forecasting windows.

02

In degenerate regimes, increased model expressivity does not reduce bias but increases variance.

03

Numerical experiments on EUR/USD data support the theoretical variance-driven degradation mechanism.

Abstract

We study trajectory forecasting under squared loss for time series with weak conditional structure, using highly expressive prediction models. Building on the classical characterization of squared-loss risk minimization, we emphasize regimes in which the conditional expectation of future trajectories is effectively degenerate, leading to trivial Bayes-optimal predictors (flat for prices and zero for returns in standard financial settings). In this regime, increased model expressivity does not improve predictive accuracy but instead introduces spurious trajectory fluctuations around the optimal predictor. These fluctuations arise from the reuse of noise and result in increased prediction variance without any reduction in bias. This provides a process-level explanation for the degradation of Transformerbased forecasts on financial time series. We complement these theoretical results…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.