HAMF: A Hybrid Attention-Mamba Framework for Joint Scene Context Understanding and Future Motion Representation Learning
Xiaodong Mei, Sheng Wang, Jie Cheng, Yingbing Chen, Dan Xu

TL;DR
HAMF introduces a hybrid attention and Mamba-based framework that jointly models scene context and future motion for improved autonomous driving trajectory prediction, achieving state-of-the-art results.
Contribution
The paper presents a novel joint scene understanding and motion prediction framework using attention mechanisms and a Mamba module, enhancing accuracy and diversity in forecasts.
Findings
Achieves state-of-the-art performance on Argoverse 2 benchmark.
Effectively combines self-attention and cross-attention for scene and motion modeling.
Demonstrates lightweight architecture with high accuracy.
Abstract
Motion forecasting represents a critical challenge in autonomous driving systems, requiring accurate prediction of surrounding agents' future trajectories. While existing approaches predict future motion states with the extracted scene context feature from historical agent trajectories and road layouts, they suffer from the information degradation during the scene feature encoding. To address the limitation, we propose HAMF, a novel motion forecasting framework that learns future motion representations with the scene context encoding jointly, to coherently combine the scene understanding and future motion state prediction. We first embed the observed agent states and map information into 1D token sequences, together with the target multi-modal future motion features as a set of learnable tokens. Then we design a unified Attention-based encoder, which synergistically combines…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Video Surveillance and Tracking Methods
MethodsSparse Evolutionary Training · Mamba: Linear-Time Sequence Modeling with Selective State Spaces
