GBT: Two-stage transformer framework for non-stationary time series forecasting
Li Shen, Yuning Wei, Yangzhu Wang

TL;DR
This paper introduces GBT, a two-stage Transformer framework for non-stationary time series forecasting that improves initialization and reduces overfitting, outperforming state-of-the-art models on multiple benchmarks.
Contribution
Proposes GBT, a novel two-stage Transformer framework with Good Beginning and Error Score Modification to enhance forecasting of non-stationary time series.
Findings
GBT outperforms SOTA TSFTs and other models on seven benchmarks.
GBT achieves better accuracy with less computational complexity.
The framework is compatible with existing models to improve their performance.
Abstract
This paper shows that time series forecasting Transformer (TSFT) suffers from severe over-fitting problem caused by improper initialization method of unknown decoder inputs, esp. when handling non-stationary time series. Based on this observation, we propose GBT, a novel two-stage Transformer framework with Good Beginning. It decouples the prediction process of TSFT into two stages, including Auto-Regression stage and Self-Regression stage to tackle the problem of different statistical properties between input and prediction sequences.Prediction results of Auto-Regression stage serve as a Good Beginning, i.e., a better initialization for inputs of Self-Regression stage. We also propose Error Score Modification module to further enhance the forecasting capability of the Self-Regression stage in GBT. Extensive experiments on seven benchmark datasets demonstrate that GBT outperforms SOTA…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTime Series Analysis and Forecasting · Neural Networks and Applications · Stock Market Forecasting Methods
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Label Smoothing · Residual Connection · Absolute Position Encodings · Adam · Layer Normalization
