Recency Biased Causal Attention for Time-series Forecasting
Kareem Hegazy, Michael W. Mahoney, N. Benjamin Erichson

TL;DR
This paper introduces a recency-biased attention mechanism for Transformers that emphasizes recent observations in time-series data, improving forecasting accuracy by better capturing local dependencies.
Contribution
It proposes a simple reweighting of attention scores with a heavy-tailed decay to incorporate recency bias, aligning Transformer behavior with RNN-like local processing.
Findings
Recency-biased attention improves time-series forecasting performance.
The method consistently outperforms standard Transformer models on benchmarks.
It effectively captures local temporal dependencies without losing broader correlation modeling.
Abstract
Recency bias is a useful inductive prior for sequential modeling: it emphasizes nearby observations and can still allow longer-range dependencies. Standard Transformer attention lacks this property, relying on all-to-all interactions that overlook the causal and often local structure of temporal data. We propose a simple mechanism to introduce recency bias by reweighting attention scores with a smooth heavy-tailed decay. This adjustment strengthens local temporal dependencies without sacrificing the flexibility to capture broader and data-specific correlations. We show that recency-biased attention consistently improves sequential modeling, aligning Transformer more closely with the read, ignore, and write operations of RNNs. Finally, we demonstrate that our approach achieves competitive and often superior performance on challenging time-series forecasting benchmarks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
