Linear Attention is Enough in Spatial-Temporal Forecasting
Xinyu Ning

TL;DR
This paper introduces a novel Transformer-based approach for spatial-temporal traffic forecasting, using independent tokens for nodes over time and a Nyström-based variant for linear complexity, achieving state-of-the-art results.
Contribution
Proposes STformer and NSTformer models that effectively capture spatial-temporal patterns with improved efficiency and accuracy in traffic forecasting.
Findings
Achieves state-of-the-art performance on traffic datasets.
NSTformer offers linear complexity with competitive accuracy.
Models outperform existing methods in capturing dynamic road network topology.
Abstract
As the most representative scenario of spatial-temporal forecasting tasks, the traffic forecasting task attracted numerous attention from machine learning community due to its intricate correlation both in space and time dimension. Existing methods often treat road networks over time as spatial-temporal graphs, addressing spatial and temporal representations independently. However, these approaches struggle to capture the dynamic topology of road networks, encounter issues with message passing mechanisms and over-smoothing, and face challenges in learning spatial and temporal relationships separately. To address these limitations, we propose treating nodes in road networks at different time steps as independent spatial-temporal tokens and feeding them into a vanilla Transformer to learn complex spatial-temporal patterns, design \textbf{STformer} achieving SOTA. Given its quadratic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRemote Sensing and LiDAR Applications
MethodsLinear Layer · Residual Connection · Layer Normalization · Multi-Head Attention · Position-Wise Feed-Forward Layer · Adam · Attention Is All You Need · Byte Pair Encoding · Absolute Position Encodings · Softmax
