LTMSformer: A Local Trend-Aware Attention and Motion State Encoding Transformer for Multi-Agent Trajectory Prediction
Yixin Yan, Yang Li, Yuanfan Wang, Xiaozhou Zhou, Beihao Xia, Manjiang Hu, Hongmao Qin

TL;DR
LTMSformer is a lightweight transformer model that captures local temporal dependencies and high-order motion attributes to improve multi-agent trajectory prediction, outperforming baselines in accuracy and efficiency.
Contribution
The paper introduces a novel Local Trend-Aware Attention mechanism and a Motion State Encoder, enhancing temporal-spatial modeling in trajectory prediction with fewer parameters.
Findings
Outperforms baseline HiVT-64 with 4.35% lower minADE and 8.74% lower minFDE.
Reduces model size by 68% compared to HiVT-128 while maintaining higher accuracy.
Achieves 20% improvement in MR over baseline methods.
Abstract
It has been challenging to model the complex temporal-spatial dependencies between agents for trajectory prediction. As each state of an agent is closely related to the states of adjacent time steps, capturing the local temporal dependency is beneficial for prediction, while most studies often overlook it. Besides, learning the high-order motion state attributes is expected to enhance spatial interaction modeling, but it is rarely seen in previous works. To address this, we propose a lightweight framework, LTMSformer, to extract temporal-spatial interaction features for multi-modal trajectory prediction. Specifically, we introduce a Local Trend-Aware Attention mechanism to capture the local temporal dependency by leveraging a convolutional attention mechanism with hierarchical local time boxes. Next, to model the spatial interaction dependency, we build a Motion State Encoder to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
