A Vision-and-Knowledge Enhanced Large Language Model for Generalizable Pedestrian Crossing Behavior Inference
Qingwen Pu, Kun Xie, Hong Yang, Guocong Zhai

TL;DR
This paper presents PedX-LLM, a vision-and-knowledge enhanced large language model that significantly improves the generalizability and accuracy of pedestrian crossing behavior inference across diverse environments.
Contribution
It introduces a novel framework combining visual features and domain knowledge with LLMs, achieving state-of-the-art performance and strong cross-site generalization.
Findings
Achieves 82.0% balanced accuracy on known sites.
Zero-shot accuracy of 66.9% on unseen sites.
Few-shot learning improves accuracy to 72.2%.
Abstract
Existing paradigms for inferring pedestrian crossing behavior, ranging from statistical models to supervised learning methods, demonstrate limited generalizability and perform inadequately on new sites. Recent advances in Large Language Models (LLMs) offer a shift from numerical pattern fitting to semantic, context-aware behavioral reasoning, yet existing LLM applications lack domain-specific adaptation and visual context. This study introduces Pedestrian Crossing LLM (PedX-LLM), a vision-and-knowledge enhanced framework designed to transform pedestrian crossing inference from site-specific pattern recognition to generalizable behavioral reasoning. By integrating LLaVA-extracted visual features with textual data and transportation domain knowledge, PedX-LLM fine-tunes a LLaMA-2-7B foundation model via Low-Rank Adaptation (LoRA) to infer crossing decisions. PedX-LLM achieves 82.0%…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAutonomous Vehicle Technology and Safety · Advanced Neural Network Applications · Automated Road and Building Extraction
