WAVE: Weighted Autoregressive Varying Gate for Time Series Forecasting

Jiecheng Lu; Xu Han; Yan Sun; Shihao Yang

arXiv:2410.03159·cs.LG·February 6, 2026

WAVE: Weighted Autoregressive Varying Gate for Time Series Forecasting

Jiecheng Lu, Xu Han, Yan Sun, Shihao Yang

PDF

Open Access 1 Repo

TL;DR

This paper introduces WAVE, an attention mechanism combining autoregressive and moving-average components, which enhances time series forecasting by capturing both long-range and local patterns, achieving state-of-the-art results.

Contribution

The paper presents WAVE, a novel attention mechanism integrating AR and MA components, improving time series forecasting performance and model flexibility.

Findings

01

WAVE improves forecasting accuracy across multiple datasets.

02

Incorporating ARMA structure enhances attention models' ability to capture temporal patterns.

03

WAVE achieves state-of-the-art results in time series forecasting.

Abstract

We propose a Weighted Autoregressive Varying gatE (WAVE) attention mechanism equipped with both Autoregressive (AR) and Moving-average (MA) components. It can adapt to various attention mechanisms, enhancing and decoupling their ability to capture long-range and local temporal patterns in time series data. In this paper, we first demonstrate that, for the time series forecasting (TSF) task, the previously overlooked decoder-only autoregressive Transformer model can achieve results comparable to the best baselines when appropriate tokenization and training methods are applied. Moreover, inspired by the ARMA model from statistics and recent advances in linear attention, we introduce the full ARMA structure into existing autoregressive attention mechanisms. By using an indirect MA weight generation method, we incorporate the MA term while maintaining the time complexity and parameter size…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ljc-fvnr/arma-attention
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Time Series Analysis and Forecasting

MethodsDense Connections · Adam · Linear Layer · Residual Connection · Position-Wise Feed-Forward Layer · Label Smoothing · Attention Is All You Need · Dropout · Byte Pair Encoding · Absolute Position Encodings