VIT-Ped: Visionary Intention Transformer for Pedestrian Behavior Analysis
Aly R. Elkammar, Karim M. Gamaleldin, Catherine M. Elias

TL;DR
This paper introduces VIT-Ped, a transformer-based model for pedestrian intention prediction that achieves state-of-the-art performance on the JAAD dataset, enhancing autonomous driving safety.
Contribution
The paper presents a novel transformer-based algorithm for pedestrian behavior analysis, demonstrating superior performance and extensive ablation studies on model design choices.
Findings
Achieved SOTA performance on JAAD dataset in accuracy, AUC, and F1-score.
Validated effectiveness of different model design choices through ablation studies.
Demonstrated the potential of vision transformers in pedestrian intention prediction.
Abstract
Pedestrian Intention prediction is one of the key technologies in the transition from level 3 to level 4 autonomous driving. To understand pedestrian crossing behaviour, several elements and features should be taken into consideration to make the roads of tomorrow safer for everybody. We introduce a transformer / video vision transformer based algorithm of different sizes which uses different data modalities .We evaluated our algorithms on popular pedestrian behaviour dataset, JAAD, and have reached SOTA performance and passed the SOTA in metrics like Accuracy, AUC and F1-score. The advantages brought by different model design choices are investigated via extensive ablation studies.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAutonomous Vehicle Technology and Safety · Traffic and Road Safety · Advanced Neural Network Applications
