ModeSeq: Taming Sparse Multimodal Motion Prediction with Sequential Mode Modeling
Zikang Zhou, Hengjian Zhou, Haibo Hu, Zihao Wen, Jianping Wang,, Yung-Hui Li, Yu-Kai Huang

TL;DR
ModeSeq introduces a sequential mode modeling approach for multimodal motion prediction, improving trajectory diversity and calibration without dense predictions or heuristic post-processing, thereby advancing autonomous driving safety.
Contribution
It proposes ModeSeq, a novel paradigm that models modes as sequences with step-by-step inference, enhancing multimodality reasoning and diversity in motion prediction.
Findings
Significantly improves multimodal trajectory diversity.
Achieves balanced accuracy on motion prediction benchmarks.
Supports mode extrapolation for highly uncertain futures.
Abstract
Anticipating the multimodality of future events lays the foundation for safe autonomous driving. However, multimodal motion prediction for traffic agents has been clouded by the lack of multimodal ground truth. Existing works predominantly adopt the winner-take-all training strategy to tackle this challenge, yet still suffer from limited trajectory diversity and uncalibrated mode confidence. While some approaches address these limitations by generating excessive trajectory candidates, they necessitate a post-processing stage to identify the most representative modes, a process lacking universal principles and compromising trajectory accuracy. We are thus motivated to introduce ModeSeq, a new multimodal prediction paradigm that models modes as sequences. Unlike the common practice of decoding multiple plausible trajectories in one shot, ModeSeq requires motion decoders to infer the next…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Human Pose and Action Recognition · Video Surveillance and Tracking Methods
MethodsADaptive gradient method with the OPTimal convergence rate
