Dolphins: Multimodal Language Model for Driving

Yingzi Ma; Yulong Cao; Jiachen Sun; Marco Pavone; Chaowei Xiao

arXiv:2312.00438·cs.CV·December 4, 2023·1 cites

Dolphins: Multimodal Language Model for Driving

Yingzi Ma, Yulong Cao, Jiachen Sun, Marco Pavone, Chaowei Xiao

PDF

Open Access 2 Repos

TL;DR

Dolphins is a multimodal vision-language model designed for autonomous driving, capable of understanding complex scenarios and performing diverse tasks with human-like reasoning and adaptability.

Contribution

The paper introduces Dolphins, a novel multimodal driving assistant model that incorporates a Grounded Chain of Thought process and domain-specific instruction tuning.

Findings

01

Enhanced reasoning with Grounded Chain of Thought

02

Holistic understanding of complex driving scenarios

03

Emergence of human-like adaptive capabilities

Abstract

The quest for fully autonomous vehicles (AVs) capable of navigating complex real-world scenarios with human-like understanding and responsiveness. In this paper, we introduce Dolphins, a novel vision-language model architected to imbibe human-like abilities as a conversational driving assistant. Dolphins is adept at processing multimodal inputs comprising video (or image) data, text instructions, and historical control signals to generate informed outputs corresponding to the provided instructions. Building upon the open-sourced pretrained Vision-Language Model, OpenFlamingo, we first enhance Dolphins's reasoning capabilities through an innovative Grounded Chain of Thought (GCoT) process. Then we tailored Dolphins to the driving domain by constructing driving-specific instruction data and conducting instruction tuning. Through the utilization of the BDD-X dataset, we designed and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning