Enhanced Motion Forecasting with Plug-and-Play Multimodal Large Language Models

Katie Luo; Jingwei Ji; Tong He; Runsheng Xu; Yichen Xie; Dragomir Anguelov; Mingxing Tan

arXiv:2510.17274·cs.CV·October 21, 2025

Enhanced Motion Forecasting with Plug-and-Play Multimodal Large Language Models

Katie Luo, Jingwei Ji, Tong He, Runsheng Xu, Yichen Xie, Dragomir Anguelov, Mingxing Tan

PDF

Open Access

TL;DR

This paper introduces Plug-and-Forecast, a plug-and-play method that enhances existing motion forecasting models with multimodal large language models, enabling better scene understanding and prediction in autonomous driving without additional training.

Contribution

The paper presents a novel approach to integrate multimodal large language models into motion forecasting, improving performance without fine-tuning and facilitating adaptation to complex scenarios.

Findings

01

Significant performance improvements on Waymo and nuScenes datasets.

02

Effective zero-shot reasoning enhances motion prediction accuracy.

03

No fine-tuning required for the integrated models.

Abstract

Current autonomous driving systems rely on specialized models for perceiving and predicting motion, which demonstrate reliable performance in standard conditions. However, generalizing cost-effectively to diverse real-world scenarios remains a significant challenge. To address this, we propose Plug-and-Forecast (PnF), a plug-and-play approach that augments existing motion forecasting models with multimodal large language models (MLLMs). PnF builds on the insight that natural language provides a more effective way to describe and handle complex scenarios, enabling quick adaptation to targeted behaviors. We design prompts to extract structured scene understanding from MLLMs and distill this information into learnable embeddings to augment existing behavior prediction models. Our method leverages the zero-shot reasoning capabilities of MLLMs to achieve significant improvements in motion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAutonomous Vehicle Technology and Safety · Human Motion and Animation · Multimodal Machine Learning Applications