WAVE: Weighted Autoregressive Varying Gate for Time Series Forecasting
Jiecheng Lu, Xu Han, Yan Sun, Shihao Yang

TL;DR
This paper introduces WAVE, an attention mechanism combining autoregressive and moving-average components, which enhances time series forecasting by capturing both long-range and local patterns, achieving state-of-the-art results.
Contribution
The paper presents WAVE, a novel attention mechanism integrating AR and MA components, improving time series forecasting performance and model flexibility.
Findings
WAVE improves forecasting accuracy across multiple datasets.
Incorporating ARMA structure enhances attention models' ability to capture temporal patterns.
WAVE achieves state-of-the-art results in time series forecasting.
Abstract
We propose a Weighted Autoregressive Varying gatE (WAVE) attention mechanism equipped with both Autoregressive (AR) and Moving-average (MA) components. It can adapt to various attention mechanisms, enhancing and decoupling their ability to capture long-range and local temporal patterns in time series data. In this paper, we first demonstrate that, for the time series forecasting (TSF) task, the previously overlooked decoder-only autoregressive Transformer model can achieve results comparable to the best baselines when appropriate tokenization and training methods are applied. Moreover, inspired by the ARMA model from statistics and recent advances in linear attention, we introduce the full ARMA structure into existing autoregressive attention mechanisms. By using an indirect MA weight generation method, we incorporate the MA term while maintaining the time complexity and parameter size…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Time Series Analysis and Forecasting
MethodsDense Connections · Adam · Linear Layer · Residual Connection · Position-Wise Feed-Forward Layer · Label Smoothing · Attention Is All You Need · Dropout · Byte Pair Encoding · Absolute Position Encodings
