Enhanced Motion Forecasting with Plug-and-Play Multimodal Large Language Models
Katie Luo, Jingwei Ji, Tong He, Runsheng Xu, Yichen Xie, Dragomir Anguelov, Mingxing Tan

TL;DR
This paper introduces Plug-and-Forecast, a plug-and-play method that enhances existing motion forecasting models with multimodal large language models, enabling better scene understanding and prediction in autonomous driving without additional training.
Contribution
The paper presents a novel approach to integrate multimodal large language models into motion forecasting, improving performance without fine-tuning and facilitating adaptation to complex scenarios.
Findings
Significant performance improvements on Waymo and nuScenes datasets.
Effective zero-shot reasoning enhances motion prediction accuracy.
No fine-tuning required for the integrated models.
Abstract
Current autonomous driving systems rely on specialized models for perceiving and predicting motion, which demonstrate reliable performance in standard conditions. However, generalizing cost-effectively to diverse real-world scenarios remains a significant challenge. To address this, we propose Plug-and-Forecast (PnF), a plug-and-play approach that augments existing motion forecasting models with multimodal large language models (MLLMs). PnF builds on the insight that natural language provides a more effective way to describe and handle complex scenarios, enabling quick adaptation to targeted behaviors. We design prompts to extract structured scene understanding from MLLMs and distill this information into learnable embeddings to augment existing behavior prediction models. Our method leverages the zero-shot reasoning capabilities of MLLMs to achieve significant improvements in motion…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAutonomous Vehicle Technology and Safety · Human Motion and Animation · Multimodal Machine Learning Applications
