Self-Supervised Moving Vehicle Detection from Audio-Visual Cues
Jannik Z\"urn, Wolfram Burgard

TL;DR
This paper introduces a self-supervised method using audio-visual cues and contrastive learning to detect moving vehicles in videos, eliminating the need for manual annotations and improving robustness to illumination changes.
Contribution
It presents a novel self-supervised approach for vehicle detection that leverages audio-visual data and contrastive learning, reducing reliance on manual annotations and enhancing domain invariance.
Findings
Accurate vehicle detection without manual annotations
Effective audio-only detection model trained via teacher-student framework
Model robust to illumination changes and domain shifts
Abstract
Robust detection of moving vehicles is a critical task for any autonomously operating outdoor robot or self-driving vehicle. Most modern approaches for solving this task rely on training image-based detectors using large-scale vehicle detection datasets such as nuScenes or the Waymo Open Dataset. Providing manual annotations is an expensive and laborious exercise that does not scale well in practice. To tackle this problem, we propose a self-supervised approach that leverages audio-visual cues to detect moving vehicles in videos. Our approach employs contrastive learning for localizing vehicles in images from corresponding pairs of images and recorded audio. In extensive experiments carried out with a real-world dataset, we demonstrate that our approach provides accurate detections of moving vehicles and does not require manual annotations. We furthermore show that our model can be used…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Speech and Audio Processing · Advanced Neural Network Applications
MethodsContrastive Learning
