Recognition and 3D Localization of Pedestrian Actions from Monocular Video
Jun Hayakawa, Behzad Dariush

TL;DR
This paper presents a novel framework for recognizing pedestrian actions and estimating their 3D location from monocular video, enhancing prediction of pedestrian behavior in urban traffic scenarios.
Contribution
It introduces a two-stream temporal relation network leveraging pose and RGB data for improved action recognition and a new loss-based network for 3D localization from monocular views.
Findings
Outperforms single-stream methods in action recognition on JAAD dataset.
Reduces average localization error on KITTI dataset.
Demonstrates effective qualitative results on H3D driving dataset.
Abstract
Understanding and predicting pedestrian behavior is an important and challenging area of research for realizing safe and effective navigation strategies in automated and advanced driver assistance technologies in urban scenes. This paper focuses on monocular pedestrian action recognition and 3D localization from an egocentric view for the purpose of predicting intention and forecasting future trajectory. A challenge in addressing this problem in urban traffic scenes is attributed to the unpredictable behavior of pedestrians, whereby actions and intentions are constantly in flux and depend on the pedestrians pose, their 3D spatial relations, and their interaction with other agents as well as with the environment. To partially address these challenges, we consider the importance of pose toward recognition and 3D localization of pedestrian actions. In particular, we propose an action…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
