Self-supervised Moving Vehicle Tracking with Stereo Sound
Chuang Gan, Hang Zhao, Peihao Chen, David Cox, Antonio Torralba

TL;DR
This paper presents a self-supervised method for localizing moving vehicles using stereo sound, transferring knowledge from visual detection models to audio-based localization without manual annotations.
Contribution
It introduces a novel framework that leverages unlabeled videos to train an audio-based vehicle localization system using self-supervision from visual data.
Findings
Outperforms baseline methods on the new Auditory Vehicle Tracking dataset.
Enables vehicle localization solely from stereo audio at inference time.
Assists visual localization under poor lighting conditions.
Abstract
Humans are able to localize objects in the environment using both visual and auditory cues, integrating information from multiple modalities into a common reference frame. We introduce a system that can leverage unlabeled audio-visual data to learn to localize objects (moving vehicles) in a visual reference frame, purely using stereo sound at inference time. Since it is labor-intensive to manually annotate the correspondences between audio and object bounding boxes, we achieve this goal by using the co-occurrence of visual and audio streams in unlabeled videos as a form of self-supervision, without resorting to the collection of ground-truth annotations. In particular, we propose a framework that consists of a vision "teacher" network and a stereo-sound "student" network. During training, knowledge embodied in a well-established visual vehicle detection model is transferred to the audio…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Advanced Neural Network Applications · Advanced Vision and Imaging
MethodsTest
