GPT-4V Takes the Wheel: Promises and Challenges for Pedestrian Behavior   Prediction

Jia Huang; Peng Jiang; Alvika Gautam; and Srikanth Saripalli

arXiv:2311.14786·cs.CV·January 29, 2024·1 cites

GPT-4V Takes the Wheel: Promises and Challenges for Pedestrian Behavior Prediction

Jia Huang, Peng Jiang, Alvika Gautam, and Srikanth Saripalli

PDF

Open Access

TL;DR

This paper evaluates GPT-4V's ability to predict pedestrian behavior in autonomous driving, highlighting its strengths in understanding complex scenarios and its limitations compared to specialized models.

Contribution

First to assess Vision Language Models like GPT-4V for pedestrian behavior prediction in autonomous driving contexts.

Findings

01

GPT-4V achieves 57% accuracy in zero-shot pedestrian behavior prediction.

02

GPT-4V effectively interprets complex traffic scenarios and pedestrian groups.

03

Challenges include detecting small pedestrians and assessing relative motion.

Abstract

Predicting pedestrian behavior is the key to ensure safety and reliability of autonomous vehicles. While deep learning methods have been promising by learning from annotated video frame sequences, they often fail to fully grasp the dynamic interactions between pedestrians and traffic, crucial for accurate predictions. These models also lack nuanced common sense reasoning. Moreover, the manual annotation of datasets for these models is expensive and challenging to adapt to new situations. The advent of Vision Language Models (VLMs) introduces promising alternatives to these issues, thanks to their advanced visual and causal reasoning skills. To our knowledge, this research is the first to conduct both quantitative and qualitative evaluations of VLMs in the context of pedestrian behavior prediction for autonomous driving. We evaluate GPT-4V(ision) on publicly available pedestrian…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAutonomous Vehicle Technology and Safety · Human Pose and Action Recognition · Traffic and Road Safety