LTMSformer: A Local Trend-Aware Attention and Motion State Encoding Transformer for Multi-Agent Trajectory Prediction

Yixin Yan; Yang Li; Yuanfan Wang; Xiaozhou Zhou; Beihao Xia; Manjiang Hu; Hongmao Qin

arXiv:2507.04634·cs.CV·July 8, 2025

LTMSformer: A Local Trend-Aware Attention and Motion State Encoding Transformer for Multi-Agent Trajectory Prediction

Yixin Yan, Yang Li, Yuanfan Wang, Xiaozhou Zhou, Beihao Xia, Manjiang Hu, Hongmao Qin

PDF

TL;DR

LTMSformer is a lightweight transformer model that captures local temporal dependencies and high-order motion attributes to improve multi-agent trajectory prediction, outperforming baselines in accuracy and efficiency.

Contribution

The paper introduces a novel Local Trend-Aware Attention mechanism and a Motion State Encoder, enhancing temporal-spatial modeling in trajectory prediction with fewer parameters.

Findings

01

Outperforms baseline HiVT-64 with 4.35% lower minADE and 8.74% lower minFDE.

02

Reduces model size by 68% compared to HiVT-128 while maintaining higher accuracy.

03

Achieves 20% improvement in MR over baseline methods.

Abstract

It has been challenging to model the complex temporal-spatial dependencies between agents for trajectory prediction. As each state of an agent is closely related to the states of adjacent time steps, capturing the local temporal dependency is beneficial for prediction, while most studies often overlook it. Besides, learning the high-order motion state attributes is expected to enhance spatial interaction modeling, but it is rarely seen in previous works. To address this, we propose a lightweight framework, LTMSformer, to extract temporal-spatial interaction features for multi-modal trajectory prediction. Specifically, we introduce a Local Trend-Aware Attention mechanism to capture the local temporal dependency by leveraging a convolutional attention mechanism with hierarchical local time boxes. Next, to model the spatial interaction dependency, we build a Motion State Encoder to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.