UniUGP: Unifying Understanding, Generation, and Planing For End-to-end Autonomous Driving
Hao Lu, Ziyang Liu, Guangfeng Jiang, Yuanfei Luo, Sheng Chen, Yangang Zhang, and Ying-Cong Chen

TL;DR
UniUGP is a unified framework that combines scene understanding, future video generation, and trajectory planning to improve autonomous driving in complex, long-tail scenarios by leveraging visual dynamics and semantic reasoning.
Contribution
The paper introduces UniUGP, a novel hybrid architecture that unifies understanding, generation, and planning for autonomous driving, integrating pre-trained models and specialized datasets for enhanced reasoning and decision-making.
Findings
Achieves state-of-the-art in perception, reasoning, and decision-making.
Demonstrates superior generalization to challenging long-tail scenarios.
Effectively leverages unlabeled videos and large language models.
Abstract
Autonomous driving (AD) systems struggle in long-tail scenarios due to limited world knowledge and weak visual dynamic modeling. Existing vision-language-action (VLA)-based methods cannot leverage unlabeled videos for visual causal learning, while world model-based methods lack reasoning capabilities from large language models. In this paper, we construct multiple specialized datasets providing reasoning and planning annotations for complex scenarios. Then, a unified Understanding-Generation-Planning framework, named UniUGP, is proposed to synergize scene reasoning, future video generation, and trajectory planning through a hybrid expert architecture. By integrating pre-trained VLMs and video generation models, UniUGP leverages visual dynamics and semantic reasoning to enhance planning performance. Taking multi-frame observations and language instructions as input, it produces…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Autonomous Vehicle Technology and Safety · Generative Adversarial Networks and Image Synthesis
