AV-PedAware: Self-Supervised Audio-Visual Fusion for Dynamic Pedestrian   Awareness

Yizhuo Yang; Shenghai Yuan; Muqing Cao; Jianfei Yang and; Lihua Xie

arXiv:2411.06789·cs.RO·April 7, 2025

AV-PedAware: Self-Supervised Audio-Visual Fusion for Dynamic Pedestrian Awareness

Yizhuo Yang, Shenghai Yuan, Muqing Cao, Jianfei Yang and, Lihua Xie

PDF

1 Repo

TL;DR

AV-PedAware introduces a self-supervised audio-visual fusion system that enhances pedestrian awareness in robotics, offering a cost-effective alternative to traditional LIDAR-based methods by effectively handling challenging environmental conditions.

Contribution

This work pioneers the use of self-supervised audio-visual fusion for pedestrian detection, leveraging footstep sounds and visual data to predict pedestrian movements in real-world scenarios.

Findings

01

Achieves comparable accuracy to LIDAR systems at lower cost.

02

Effectively handles occlusion and lighting variations.

03

Demonstrates reliable 3D pedestrian detection using only audio-visual data.

Abstract

In this study, we introduce AV-PedAware, a self-supervised audio-visual fusion system designed to improve dynamic pedestrian awareness for robotics applications. Pedestrian awareness is a critical requirement in many robotics applications. However, traditional approaches that rely on cameras and LIDARs to cover multiple views can be expensive and susceptible to issues such as changes in illumination, occlusion, and weather conditions. Our proposed solution replicates human perception for 3D pedestrian detection using low-cost audio and visual fusion. This study represents the first attempt to employ audio-visual fusion to monitor footstep sounds for the purpose of predicting the movements of pedestrians in the vicinity. The system is trained through self-supervised learning based on LIDAR-generated labels, making it a cost-effective alternative to LIDAR-based pedestrian awareness.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yizhuoyang/AV-PedAware
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.