NAST: Non-Autoregressive Spatial-Temporal Transformer for Time Series Forecasting
Kai Chen, Guang Chen, Dan Xu, Lijun Zhang, Yuyao Huang, Alois Knoll

TL;DR
This paper introduces NAST, a non-autoregressive spatial-temporal Transformer that effectively models dependencies in time series forecasting, overcoming errors from autoregressive decoding and integrating spatial-temporal attention.
Contribution
It proposes the first non-autoregressive Transformer for time series forecasting with a novel spatial-temporal attention mechanism using a learned influence map.
Findings
Achieves state-of-the-art performance on ego-centric localization datasets.
Effectively reduces accumulative errors compared to autoregressive models.
Demonstrates superior real-time forecasting accuracy.
Abstract
Although Transformer has made breakthrough success in widespread domains especially in Natural Language Processing (NLP), applying it to time series forecasting is still a great challenge. In time series forecasting, the autoregressive decoding of canonical Transformer models could introduce huge accumulative errors inevitably. Besides, utilizing Transformer to deal with spatial-temporal dependencies in the problem still faces tough difficulties.~To tackle these limitations, this work is the first attempt to propose a Non-Autoregressive Transformer architecture for time series forecasting, aiming at overcoming the time delay and accumulative error issues in the canonical Transformer. Moreover, we present a novel spatial-temporal attention mechanism, building a bridge by a learned temporal influence map to fill the gaps between the spatial and temporal attention, so that spatial and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTime Series Analysis and Forecasting · Anomaly Detection Techniques and Applications · Statistical and numerical algorithms
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Attention Is All You Need · Byte Pair Encoding · Softmax · Label Smoothing · Layer Normalization · Dense Connections · Dropout
