ProSTformer: Pre-trained Progressive Space-Time Self-attention Model for Traffic Flow Forecasting
Xiao Yan, Xianghua Gan, Jingjing Tang, Rui Wang

TL;DR
ProSTformer introduces a progressive space-time self-attention model that effectively captures spatiotemporal dependencies in traffic flow data, improving forecasting accuracy across various dataset scales.
Contribution
It proposes a novel factorized, progressive self-attention mechanism that incorporates spatiotemporal structure, reducing computation and enhancing traffic forecasting performance.
Findings
ProSTformer outperforms six state-of-the-art methods on large-scale datasets.
Pre-training on large datasets improves performance on smaller datasets.
The model effectively captures local to global spatial and inside-outside temporal dependencies.
Abstract
Traffic flow forecasting is essential and challenging to intelligent city management and public safety. Recent studies have shown the potential of convolution-free Transformer approach to extract the dynamic dependencies among complex influencing factors. However, two issues prevent the approach from being effectively applied in traffic flow forecasting. First, it ignores the spatiotemporal structure of the traffic flow videos. Second, for a long sequence, it is hard to focus on crucial attention due to the quadratic times dot-product computation. To address the two issues, we first factorize the dependencies and then design a progressive space-time self-attention mechanism named ProSTformer. It has two distinctive characteristics: (1) corresponding to the factorization, the self-attention mechanism progressively focuses on spatial dependence from local to global regions, on temporal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTraffic Prediction and Management Techniques · Time Series Analysis and Forecasting · Anomaly Detection Techniques and Applications
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Layer Normalization · Adam · Dropout · Residual Connection · Dense Connections · Absolute Position Encodings · Byte Pair Encoding
