Drive Anywhere: Generalizable End-to-end Autonomous Driving with   Multi-modal Foundation Models

Tsun-Hsuan Wang; Alaa Maalouf; Wei Xiao; Yutong Ban and; Alexander Amini; Guy Rosman; Sertac Karaman; Daniela Rus

arXiv:2310.17642·cs.RO·October 27, 2023·1 cites

Drive Anywhere: Generalizable End-to-end Autonomous Driving with Multi-modal Foundation Models

Tsun-Hsuan Wang, Alaa Maalouf, Wei Xiao, Yutong Ban and, Alexander Amini, Guy Rosman, Sertac Karaman, Daniela Rus

PDF

Open Access

TL;DR

This paper introduces a multimodal foundation model-based end-to-end autonomous driving system that is robust to open-set environments, providing explainability and improved training through text-based data augmentation.

Contribution

It presents a novel approach leveraging multimodal foundation models for open-set autonomous driving, extracting spatial and semantic features from transformers for robustness and explainability.

Findings

01

Achieves superior performance in diverse, out-of-distribution tests.

02

Enables data augmentation and debugging using text-based latent space simulation.

03

Demonstrates enhanced robustness and adaptability in autonomous driving scenarios.

Abstract

As autonomous driving technology matures, end-to-end methodologies have emerged as a leading strategy, promising seamless integration from perception to control via deep learning. However, existing systems grapple with challenges such as unexpected open set environments and the complexity of black-box models. At the same time, the evolution of deep learning introduces larger, multimodal foundational models, offering multi-modal visual and textual understanding. In this paper, we harness these multimodal foundation models to enhance the robustness and adaptability of autonomous driving systems, enabling out-of-distribution, end-to-end, multimodal, and more explainable autonomy. Specifically, we present an approach to apply end-to-end open-set (any environment/scene) autonomous driving that is capable of providing driving decisions from representations queryable by image and text. To do…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques

MethodsSparse Evolutionary Training