Retrieval Mechanisms Surpass Long-Context Scaling in Time Series Forecasting
Rishi Ahuja, Kumar Prateek, Simranjit Singh, Vijay Kumar

TL;DR
This paper demonstrates that in time series forecasting, longer context windows can degrade performance due to irrelevant noise, and proposes retrieval-augmented methods that outperform traditional long-context models.
Contribution
The work challenges the assumption that more historical data improves forecasting and introduces retrieval-augmented forecasting as a more effective alternative.
Findings
Forecasting error increases with longer context windows, showing an inverse scaling law.
Retrieval-augmented forecasting (RAFT) outperforms long-context models and foundation models on ETTh1.
Selective retrieval of relevant historical segments improves forecasting accuracy.
Abstract
Time Series Foundation Models (TSFMs) have borrowed the long context paradigm from natural language processing under the premise that feeding more history into the model improves forecast quality. But in stochastic domains, distant history is often just high-frequency noise, not signal. Hence, the proposed work tests whether this premise actually holds by running continuous context architectures (PatchTST included) through the ETTh1 benchmark. The obtained results contradict the premise: an inverse scaling law shows up clearly, with forecasting error rising as context gets longer. A 3,000-step window causes performance to drop by over 68%, evidence that attention mechanisms are poor at ignoring irrelevant historical volatility. Retrieval-Augmented Forecasting (RAFT) is evaluated as an alternative. RAFT achieves a mean squared error (MSE) of 0.379 with a fixed 720-step window and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
