Long-Range Transformers for Dynamic Spatiotemporal Forecasting
Jake Grigsby, Zhe Wang, Nam Nguyen, Yanjun Qi

TL;DR
This paper introduces Spacetimeformer, a Long-Range Transformer model that jointly learns spatial, temporal, and value interactions in multivariate time series forecasting, outperforming existing methods on various benchmarks.
Contribution
The work proposes a novel spatiotemporal sequence formulation and a Transformer architecture that learns variable relationships directly from data without predefined graphs.
Findings
Achieves competitive results on traffic, electricity, and weather benchmarks.
Learns dynamic spatiotemporal relationships purely from data.
Outperforms traditional graph-based and sequence models.
Abstract
Multivariate time series forecasting focuses on predicting future values based on historical context. State-of-the-art sequence-to-sequence models rely on neural attention between timesteps, which allows for temporal learning but fails to consider distinct spatial relationships between variables. In contrast, methods based on graph neural networks explicitly model variable relationships. However, these methods often rely on predefined graphs that cannot change over time and perform separate spatial and temporal updates without establishing direct connections between each variable at every timestep. Our work addresses these problems by translating multivariate forecasting into a "spatiotemporal sequence" formulation where each Transformer input token represents the value of a single variable at a given time. Long-Range Transformers can then learn interactions between space, time, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTime Series Analysis and Forecasting · Traffic Prediction and Management Techniques · Energy Load and Power Forecasting
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Layer Normalization · Byte Pair Encoding · Dense Connections · Dropout · Absolute Position Encodings · Position-Wise Feed-Forward Layer
