GPT-4V Explorations: Mining Autonomous Driving
Zixuan Li

TL;DR
This paper investigates GPT-4V's capabilities in autonomous mining driving, focusing on scene understanding, decision-making, and navigation in complex industrial environments.
Contribution
It demonstrates GPT-4V's potential for autonomous driving in mining, highlighting its scene comprehension and decision-making abilities in challenging settings.
Findings
Robust scene understanding and decision-making demonstrated
Challenges in vehicle type recognition and dynamic interactions
Effective navigation in complex mining environments
Abstract
This paper explores the application of the GPT-4V(ision) large visual language model to autonomous driving in mining environments, where traditional systems often falter in understanding intentions and making accurate decisions during emergencies. GPT-4V introduces capabilities for visual question answering and complex scene comprehension, addressing challenges in these specialized settings.Our evaluation focuses on its proficiency in scene understanding, reasoning, and driving functions, with specific tests on its ability to recognize and interpret elements such as pedestrians, various vehicles, and traffic devices. While GPT-4V showed robust comprehension and decision-making skills, it faced difficulties in accurately identifying specific vehicle types and managing dynamic interactions. Despite these challenges, its effective navigation and strategic decision-making demonstrate its…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAutonomous Vehicle Technology and Safety
