Sound Localization from Motion: Jointly Learning Sound Direction and Camera Rotation
Ziyang Chen, Shengyi Qian, Andrew Owens

TL;DR
This paper introduces a self-supervised method that jointly estimates camera rotation and localizes sound sources by leveraging geometric cues from head movements, improving sound localization accuracy without labeled data.
Contribution
It proposes a novel self-supervised framework that learns to estimate camera rotation and sound source direction simultaneously using cross-view binauralization.
Findings
Accurately estimates camera rotation on real and synthetic scenes.
Localizes sound sources with accuracy competitive to state-of-the-art methods.
Introduces a cross-view binauralization technique for better audio-visual representation learning.
Abstract
The images and sounds that we perceive undergo subtle but geometrically consistent changes as we rotate our heads. In this paper, we use these cues to solve a problem we call Sound Localization from Motion (SLfM): jointly estimating camera rotation and localizing sound sources. We learn to solve these tasks solely through self-supervision. A visual model predicts camera rotation from a pair of images, while an audio model predicts the direction of sound sources from binaural sounds. We train these models to generate predictions that agree with one another. At test time, the models can be deployed independently. To obtain a feature representation that is well-suited to solving this challenging problem, we also propose a method for learning an audio-visual representation through cross-view binauralization: estimating binaural sound from one view, given images and sound from another. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Hearing Loss and Rehabilitation
MethodsTest
