TL;DR
AV-PedAware introduces a self-supervised audio-visual fusion system that enhances pedestrian awareness in robotics, offering a cost-effective alternative to traditional LIDAR-based methods by effectively handling challenging environmental conditions.
Contribution
This work pioneers the use of self-supervised audio-visual fusion for pedestrian detection, leveraging footstep sounds and visual data to predict pedestrian movements in real-world scenarios.
Findings
Achieves comparable accuracy to LIDAR systems at lower cost.
Effectively handles occlusion and lighting variations.
Demonstrates reliable 3D pedestrian detection using only audio-visual data.
Abstract
In this study, we introduce AV-PedAware, a self-supervised audio-visual fusion system designed to improve dynamic pedestrian awareness for robotics applications. Pedestrian awareness is a critical requirement in many robotics applications. However, traditional approaches that rely on cameras and LIDARs to cover multiple views can be expensive and susceptible to issues such as changes in illumination, occlusion, and weather conditions. Our proposed solution replicates human perception for 3D pedestrian detection using low-cost audio and visual fusion. This study represents the first attempt to employ audio-visual fusion to monitor footstep sounds for the purpose of predicting the movements of pedestrians in the vicinity. The system is trained through self-supervised learning based on LIDAR-generated labels, making it a cost-effective alternative to LIDAR-based pedestrian awareness.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
