Is attention to bounding boxes all you need for pedestrian action prediction?
Lina Achaji, Julien Moreau, Thibault Fouqueray, Francois Aioun,, Francois Charpillet

TL;DR
This paper introduces a Transformer-based framework that predicts pedestrian crossing actions using only bounding boxes, achieving state-of-the-art accuracy on real and simulated datasets, and highlights the importance of human attention in this task.
Contribution
It demonstrates that simple bounding box inputs can outperform previous methods and shows the benefits of pre-training on simulated data for pedestrian action prediction.
Findings
Achieved 91% accuracy and 0.83 F1-score on PIE dataset.
Achieved 91% accuracy and 0.91 F1-score on CP2A dataset.
Pre-training on simulated data improves real-world prediction performance.
Abstract
The human driver is no longer the only one concerned with the complexity of the driving scenarios. Autonomous vehicles (AV) are similarly becoming involved in the process. Nowadays, the development of AVs in urban places raises essential safety concerns for vulnerable road users (VRUs) such as pedestrians. Therefore, to make the roads safer, it is critical to classify and predict the pedestrians' future behavior. In this paper, we present a framework based on multiple variations of the Transformer models able to infer predict the pedestrian street-crossing decision-making based on the dynamics of its initiated trajectory. We showed that using solely bounding boxes as input features can outperform the previous state-of-the-art results by reaching a prediction accuracy of 91\% and an F1-score of 0.83 on the PIE dataset. In addition, we introduced a large-size simulated dataset (CP2A)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAutonomous Vehicle Technology and Safety · Human-Automation Interaction and Safety · Traffic and Road Safety
MethodsAttention Is All You Need · Entropy Regularization · Proximal Policy Optimization · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Label Smoothing · Residual Connection · Dense Connections · Softmax
