Visual Sound Localization in the Wild by Cross-Modal Interference   Erasing

Xian Liu; Rui Qian; Hang Zhou; Di Hu; Weiyao Lin; Ziwei Liu; Bolei; Zhou; Xiaowei Zhou

arXiv:2202.06406·cs.CV·February 15, 2022

Visual Sound Localization in the Wild by Cross-Modal Interference Erasing

Xian Liu, Rui Qian, Hang Zhou, Di Hu, Weiyao Lin, Ziwei Liu, Bolei, Zhou, Xiaowei Zhou

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces the Interference Eraser framework for audio-visual sound source localization in real-world scenarios, effectively removing off-screen sounds and background noise to improve localization accuracy.

Contribution

It proposes a novel framework with modules for discriminative audio representation and cross-modal interference removal, addressing real-world challenges in sound localization.

Findings

01

Achieves superior localization results in wild scenarios

02

Effectively removes off-screen sounds and background noise

03

Outperforms previous methods in real-world tests

Abstract

The task of audio-visual sound source localization has been well studied under constrained scenes, where the audio recordings are clean. However, in real-world scenarios, audios are usually contaminated by off-screen sound and background noise. They will interfere with the procedure of identifying desired sources and building visual-sound connections, making previous studies non-applicable. In this work, we propose the Interference Eraser (IEr) framework, which tackles the problem of audio-visual sound source localization in the wild. The key idea is to eliminate the interference by redefining and carving discriminative audio representations. Specifically, we observe that the previous practice of learning only a single audio representation is insufficient due to the additive nature of audio signals. We thus extend the audio representation with our Audio-Instance-Identifier module, which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

alvinliu0/visual-sound-localization-in-the-wild
noneOfficial

Videos

Visual Sound Localization in the Wild by Cross-Modal Interference Erasing· underline

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Hearing Loss and Rehabilitation