Pedestrian Trajectory Prediction via Spatial Interaction Transformer Network
Tong Su, Yu Meng, Yan Xu

TL;DR
This paper introduces the Spatial Interaction Transformer (SIT), a novel generative model utilizing attention mechanisms and CVAE to predict complex pedestrian trajectories, improving accuracy over existing methods in autonomous driving scenarios.
Contribution
The paper proposes a new spatial interaction transformer model combined with CVAE for more accurate pedestrian trajectory prediction in traffic environments.
Findings
SIT outperforms state-of-the-art methods on nuScenes dataset.
Model demonstrates robustness on ETH and UCY datasets.
Attention-based approach effectively captures spatio-temporal pedestrian interactions.
Abstract
As a core technology of the autonomous driving system, pedestrian trajectory prediction can significantly enhance the function of active vehicle safety and reduce road traffic injuries. In traffic scenes, when encountering with oncoming people, pedestrians may make sudden turns or stop immediately, which often leads to complicated trajectories. To predict such unpredictable trajectories, we can gain insights into the interaction between pedestrians. In this paper, we present a novel generative method named Spatial Interaction Transformer (SIT), which learns the spatio-temporal correlation of pedestrian trajectories through attention mechanisms. Furthermore, we introduce the conditional variational autoencoder (CVAE) framework to model the future latent motion states of pedestrians. In particular, the experiments based on large-scale trafc dataset nuScenes [2] show that SIT has an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAutonomous Vehicle Technology and Safety · Traffic Prediction and Management Techniques · Traffic and Road Safety
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dropout · Label Smoothing · Byte Pair Encoding · Softmax · Absolute Position Encodings · Adam · Position-Wise Feed-Forward Layer
