On the Road with GPT-4V(ision): Early Explorations of Visual-Language   Model on Autonomous Driving

Licheng Wen; Xuemeng Yang; Daocheng Fu; Xiaofeng Wang; Pinlong Cai,; Xin Li; Tao Ma; Yingxuan Li; Linran Xu; Dengke Shang; Zheng Zhu; Shaoyan Sun,; Yeqi Bai; Xinyu Cai; Min Dou; Shuanglu Hu; Botian Shi; Yu Qiao

arXiv:2311.05332·cs.CV·November 29, 2023·21 cites

On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving

Licheng Wen, Xuemeng Yang, Daocheng Fu, Xiaofeng Wang, Pinlong Cai,, Xin Li, Tao Ma, Yingxuan Li, Linran Xu, Dengke Shang, Zheng Zhu, Shaoyan Sun,, Yeqi Bai, Xinyu Cai, Min Dou, Shuanglu Hu, Botian Shi, Yu Qiao

PDF

Open Access 1 Repo

TL;DR

This paper evaluates GPT-4V(ision), a visual-language model, for autonomous driving, demonstrating its strengths in scene understanding and reasoning, while identifying key challenges for real-world deployment.

Contribution

It provides an exhaustive assessment of GPT-4V(ision) in autonomous driving scenarios, highlighting its capabilities and limitations in complex driving environments.

Findings

01

Superior scene understanding and causal reasoning compared to existing systems

02

Effective handling of out-of-distribution scenarios and recognizing intentions

03

Challenges in direction discernment, traffic light recognition, and spatial reasoning

Abstract

The pursuit of autonomous driving technology hinges on the sophisticated integration of perception, decision-making, and control systems. Traditional approaches, both data-driven and rule-based, have been hindered by their inability to grasp the nuance of complex driving environments and the intentions of other road users. This has been a significant bottleneck, particularly in the development of common sense reasoning and nuanced scene understanding necessary for safe and reliable autonomous driving. The advent of Visual Language Models (VLM) represents a novel frontier in realizing fully autonomous vehicle driving. This report provides an exhaustive evaluation of the latest state-of-the-art VLM, GPT-4V(ision), and its application in autonomous driving scenarios. We explore the model's abilities to understand and reason about driving scenes, make decisions, and ultimately act in the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pjlab-adg/gpt4v-ad-exploration
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Visual Attention and Saliency Detection