Vision-Integrated LLMs for Autonomous Driving Assistance : Human Performance Comparison and Trust Evaluation
Namhee Kim, Woojin Park

TL;DR
This paper presents a novel autonomous driving assistance system that integrates vision adapters and GPT-4 for improved visual understanding and reasoning, closely matching human performance in complex driving scenarios.
Contribution
It introduces a vision-integrated LLM framework combining YOLOv4, ViT, and GPT-4 for enhanced decision-making in autonomous driving assistance.
Findings
System closely mirrors human performance in describing situations
Moderately aligns with human decisions in response generation
Demonstrates potential for improved autonomous driving support
Abstract
Traditional autonomous driving systems often struggle with reasoning in complex, unexpected scenarios due to limited comprehension of spatial relationships. In response, this study introduces a Large Language Model (LLM)-based Autonomous Driving (AD) assistance system that integrates a vision adapter and an LLM reasoning module to enhance visual understanding and decision-making. The vision adapter, combining YOLOv4 and Vision Transformer (ViT), extracts comprehensive visual features, while GPT-4 enables human-like spatial reasoning and response generation. Experimental evaluations with 45 experienced drivers revealed that the system closely mirrors human performance in describing situations and moderately aligns with human decisions in generating appropriate responses.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman-Automation Interaction and Safety · Advanced Neural Network Applications · Autonomous Vehicle Technology and Safety
Methods(TravEL!!Guide)How Do I File a Claim with Expedia? · BNB Customer Service Number +1-833-534-1729 · Attention Is All You Need · *Communicated@Fast*How Do I Communicate to Expedia? · Batch Normalization · Global Average Pooling · Tanh Activation · Average Pooling · k-Means Clustering · Bottom-up Path Augmentation
