LLaViDA: A Large Language Vision Driving Assistant for Explicit Reasoning and Enhanced Trajectory Planning

Yudong Liu; Spencer Hallyburton; Jiwoo Kim; Yueqian Lin; Yiming Li; Qinsi Wang; Hui Ye; Jingwei Sun; Miroslav Pajic; Yiran Chen; Hai Li

arXiv:2512.18211·cs.RO·December 23, 2025

LLaViDA: A Large Language Vision Driving Assistant for Explicit Reasoning and Enhanced Trajectory Planning

Yudong Liu, Spencer Hallyburton, Jiwoo Kim, Yueqian Lin, Yiming Li, Qinsi Wang, Hui Ye, Jingwei Sun, Miroslav Pajic, Yiran Chen, Hai Li

PDF

Open Access

TL;DR

LLaViDA introduces a vision-language model-based assistant for autonomous driving that improves trajectory planning accuracy and safety, especially under challenging conditions, through a novel training pipeline and reasoning capabilities.

Contribution

The paper presents LLaViDA, a new VLM-based trajectory planner with a two-stage training process, outperforming existing methods on the NuScenes benchmark.

Findings

01

Achieves 0.31 m average L2 trajectory error

02

Collision rate of 0.10% on NuScenes test set

03

Outperforms state-of-the-art in open-loop trajectory planning

Abstract

Trajectory planning is a fundamental yet challenging component of autonomous driving. End-to-end planners frequently falter under adverse weather, unpredictable human behavior, or complex road layouts, primarily because they lack strong generalization or few-shot capabilities beyond their training data. We propose LLaViDA, a Large Language Vision Driving Assistant that leverages a Vision-Language Model (VLM) for object motion prediction, semantic grounding, and chain-of-thought reasoning for trajectory planning in autonomous driving. A two-stage training pipeline--supervised fine-tuning followed by Trajectory Preference Optimization (TPO)--enhances scene understanding and trajectory planning by injecting regression-based supervision, produces a powerful "VLM Trajectory Planner for Autonomous Driving." On the NuScenes benchmark, LLaViDA surpasses state-of-the-art end-to-end and other…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAutonomous Vehicle Technology and Safety · Multimodal Machine Learning Applications · Advanced Neural Network Applications