Research on Driving Scenario Technology Based on Multimodal Large Lauguage Model Optimization

Wang Mengjie; Zhu Huiping; Li Jian; Shi Wenxiu; Zhang Song

arXiv:2506.02014·cs.CV·June 4, 2025

Research on Driving Scenario Technology Based on Multimodal Large Lauguage Model Optimization

Wang Mengjie, Zhu Huiping, Li Jian, Shi Wenxiu, Zhang Song

PDF

Open Access

TL;DR

This paper presents a comprehensive optimization framework for multimodal large language models tailored to complex driving scenarios, enhancing accuracy and efficiency in autonomous driving applications.

Contribution

It introduces dynamic prompt optimization, dataset construction with real and synthetic data, and advanced training techniques like knowledge distillation for driving scenario models.

Findings

01

Improved model accuracy in key driving tasks

02

Enhanced resource efficiency through quantization and distillation

03

Effective handling of complex driving environments

Abstract

With the advancement of autonomous and assisted driving technologies, higher demands are placed on the ability to understand complex driving scenarios. Multimodal general large models have emerged as a solution for this challenge. However, applying these models in vertical domains involves difficulties such as data collection, model training, and deployment optimization. This paper proposes a comprehensive method for optimizing multimodal models in driving scenarios, including cone detection, traffic light recognition, speed limit recommendation, and intersection alerts. The method covers key aspects such as dynamic prompt optimization, dataset construction, model training, and deployment. Specifically, the dynamic prompt optimization adjusts the prompts based on the input image content to focus on objects affecting the ego vehicle, enhancing the model's task-specific focus and judgment…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSimulation and Modeling Applications