GPT-4V Takes the Wheel: Promises and Challenges for Pedestrian Behavior Prediction
Jia Huang, Peng Jiang, Alvika Gautam, and Srikanth Saripalli

TL;DR
This paper evaluates GPT-4V's ability to predict pedestrian behavior in autonomous driving, highlighting its strengths in understanding complex scenarios and its limitations compared to specialized models.
Contribution
First to assess Vision Language Models like GPT-4V for pedestrian behavior prediction in autonomous driving contexts.
Findings
GPT-4V achieves 57% accuracy in zero-shot pedestrian behavior prediction.
GPT-4V effectively interprets complex traffic scenarios and pedestrian groups.
Challenges include detecting small pedestrians and assessing relative motion.
Abstract
Predicting pedestrian behavior is the key to ensure safety and reliability of autonomous vehicles. While deep learning methods have been promising by learning from annotated video frame sequences, they often fail to fully grasp the dynamic interactions between pedestrians and traffic, crucial for accurate predictions. These models also lack nuanced common sense reasoning. Moreover, the manual annotation of datasets for these models is expensive and challenging to adapt to new situations. The advent of Vision Language Models (VLMs) introduces promising alternatives to these issues, thanks to their advanced visual and causal reasoning skills. To our knowledge, this research is the first to conduct both quantitative and qualitative evaluations of VLMs in the context of pedestrian behavior prediction for autonomous driving. We evaluate GPT-4V(ision) on publicly available pedestrian…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAutonomous Vehicle Technology and Safety · Human Pose and Action Recognition · Traffic and Road Safety
