S2TX: Cross-Attention Multi-Scale State-Space Transformer for Time   Series Forecasting

Zihao Wu; Juncheng Dong; Haoming Yang; and Vahid Tarokh

arXiv:2502.11340·cs.LG·February 18, 2025

S2TX: Cross-Attention Multi-Scale State-Space Transformer for Time Series Forecasting

Zihao Wu, Juncheng Dong, Haoming Yang, and Vahid Tarokh

PDF

Open Access

TL;DR

S2TX introduces a cross-attention multi-scale state-space transformer that effectively integrates long and short-range patterns in multivariate time series forecasting, improving performance and communication between variates.

Contribution

The paper proposes S2TX, a novel model combining cross-attention with state-space transformers to unify multi-scale and multivariate time series modeling.

Findings

01

Achieves state-of-the-art results on seven benchmark datasets.

02

Maintains low memory footprint while improving accuracy.

03

Effectively models long and short-range dependencies with variate interactions.

Abstract

Time series forecasting has recently achieved significant progress with multi-scale models to address the heterogeneity between long and short range patterns. Despite their state-of-the-art performance, we identify two potential areas for improvement. First, the variates of the multivariate time series are processed independently. Moreover, the multi-scale (long and short range) representations are learned separately by two independent models without communication. In light of these concerns, we propose State Space Transformer with cross-attention (S2TX). S2TX employs a cross-attention mechanism to integrate a Mamba model for extracting long-range cross-variate context and a Transformer model with local window attention to capture short-range representations. By cross-attending to the global context, the Transformer model further facilitates variate-level interactions as well as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTime Series Analysis and Forecasting

MethodsAttention Is All You Need · Byte Pair Encoding · Layer Normalization · Residual Connection · Linear Layer · Dense Connections · Multi-Head Attention · Mamba: Linear-Time Sequence Modeling with Selective State Spaces · Position-Wise Feed-Forward Layer · Adam