VIT-Ped: Visionary Intention Transformer for Pedestrian Behavior Analysis

Aly R. Elkammar; Karim M. Gamaleldin; Catherine M. Elias

arXiv:2601.01989·cs.CV·January 6, 2026

VIT-Ped: Visionary Intention Transformer for Pedestrian Behavior Analysis

Aly R. Elkammar, Karim M. Gamaleldin, Catherine M. Elias

PDF

Open Access

TL;DR

This paper introduces VIT-Ped, a transformer-based model for pedestrian intention prediction that achieves state-of-the-art performance on the JAAD dataset, enhancing autonomous driving safety.

Contribution

The paper presents a novel transformer-based algorithm for pedestrian behavior analysis, demonstrating superior performance and extensive ablation studies on model design choices.

Findings

01

Achieved SOTA performance on JAAD dataset in accuracy, AUC, and F1-score.

02

Validated effectiveness of different model design choices through ablation studies.

03

Demonstrated the potential of vision transformers in pedestrian intention prediction.

Abstract

Pedestrian Intention prediction is one of the key technologies in the transition from level 3 to level 4 autonomous driving. To understand pedestrian crossing behaviour, several elements and features should be taken into consideration to make the roads of tomorrow safer for everybody. We introduce a transformer / video vision transformer based algorithm of different sizes which uses different data modalities .We evaluated our algorithms on popular pedestrian behaviour dataset, JAAD, and have reached SOTA performance and passed the SOTA in metrics like Accuracy, AUC and F1-score. The advantages brought by different model design choices are investigated via extensive ablation studies.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAutonomous Vehicle Technology and Safety · Traffic and Road Safety · Advanced Neural Network Applications