Senna: Bridging Large Vision-Language Models and End-to-End Autonomous   Driving

Bo Jiang; Shaoyu Chen; Bencheng Liao; Xingyu Zhang; Wei Yin; Qian; Zhang; Chang Huang; Wenyu Liu; Xinggang Wang

arXiv:2410.22313·cs.CV·October 30, 2024·2 cites

Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving

Bo Jiang, Shaoyu Chen, Bencheng Liao, Xingyu Zhang, Wei Yin, Qian, Zhang, Chang Huang, Wenyu Liu, Xinggang Wang

PDF

Open Access 1 Repo 1 Models

TL;DR

Senna integrates large vision-language models with end-to-end autonomous driving to improve planning accuracy and safety, leveraging scene understanding and reasoning for better decision-making.

Contribution

This work introduces Senna, a novel system combining LVLMs with end-to-end models, featuring decoupled planning and trajectory prediction, and a multi-stage training strategy for autonomous driving.

Findings

01

Achieves state-of-the-art planning performance on two datasets.

02

Reduces average planning error by 27.12% after pre-training.

03

Decreases collision rate by 33.33% with large-scale pre-training.

Abstract

End-to-end autonomous driving demonstrates strong planning capabilities with large-scale data but still struggles in complex, rare scenarios due to limited commonsense. In contrast, Large Vision-Language Models (LVLMs) excel in scene understanding and reasoning. The path forward lies in merging the strengths of both approaches. Previous methods using LVLMs to predict trajectories or control signals yield suboptimal results, as LVLMs are not well-suited for precise numerical predictions. This paper presents Senna, an autonomous driving system combining an LVLM (Senna-VLM) with an end-to-end model (Senna-E2E). Senna decouples high-level planning from low-level trajectory prediction. Senna-VLM generates planning decisions in natural language, while Senna-E2E predicts precise trajectories. Senna-VLM utilizes a multi-image encoding approach and multi-view prompts for efficient scene…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hustvl/senna
pytorchOfficial

Models

🤗
rb93dett/Senna
model· 17 dl· ♡ 4
17 dl♡ 4

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques