Dateformer: Time-modeling Transformer for Longer-term Series Forecasting
Julong Young, Junhui Chen, Feihu Huang, Jian Peng

TL;DR
Dateformer introduces a novel patch-wise Transformer approach for long-term time series forecasting, leveraging global information through time representations to significantly improve accuracy and forecast range.
Contribution
The paper proposes a new patch-wise Transformer model that utilizes global training data via time representations, overcoming the limitations of narrow lookback windows.
Findings
Achieves 33.6% relative accuracy improvement on 7 datasets
Extends forecast horizon to half a year
Outperforms existing models in long-term series forecasting
Abstract
Transformers have demonstrated impressive strength in long-term series forecasting. Existing prediction research mostly focused on mapping past short sub-series (lookback window) to future series (forecast window). The longer training dataset time series will be discarded, once training is completed. Models can merely rely on lookback window information for inference, which impedes models from analyzing time series from a global perspective. And these windows used by Transformers are quite narrow because they must model each time-step therein. Under this point-wise processing style, broadening windows will rapidly exhaust their model capacity. This, for fine-grained time series, leads to a bottleneck in information input and prediction output, which is mortal to long-term series forecasting. To overcome the barrier, we propose a brand-new methodology to utilize Transformer for time…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTime Series Analysis and Forecasting · Stock Market Forecasting Methods · Forecasting Techniques and Applications
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Label Smoothing · Dense Connections · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Adam · Dropout · Byte Pair Encoding
