Do We Need Sound for Sound Source Localization?
Takashi Oya, Shohei Iwase, Ryota Natsume, Takahiro Itazuri, Shugo, Yamaguchi, Shigeo Morishima

TL;DR
This paper investigates the necessity of audio information in sound source localization, developing an unsupervised system that emphasizes visual cues and reveals visual dominance in current benchmarks.
Contribution
The authors propose an unsupervised two-step system for sound source localization and highlight the dominance of visual information over audio in existing datasets.
Findings
Visual information alone achieves comparable localization performance.
Current datasets are inadequate for evaluating audio contribution.
An alternative evaluation protocol is proposed to better assess audio-visual integration.
Abstract
During the performance of sound source localization which uses both visual and aural information, it presently remains unclear how much either image or sound modalities contribute to the result, i.e. do we need both image and sound for sound source localization? To address this question, we develop an unsupervised learning system that solves sound source localization by decomposing this task into two steps: (i) "potential sound source localization", a step that localizes possible sound sources using only visual information (ii) "object selection", a step that identifies which objects are actually sounding using aural information. Our overall system achieves state-of-the-art performance in sound source localization, and more importantly, we find that despite the constraint on available information, the results of (i) achieve similar performance. From this observation and further…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Hearing Loss and Rehabilitation
