MarginNCE: Robust Sound Localization with a Negative Margin

Sooyoung Park; Arda Senocak; Joon Son Chung

arXiv:2211.01966·cs.CV·November 4, 2022·1 cites

MarginNCE: Robust Sound Localization with a Negative Margin

Sooyoung Park, Arda Senocak, Joon Son Chung

PDF

Open Access

TL;DR

This paper introduces MarginNCE, a contrastive learning method with a negative margin that improves robustness in sound source localization amidst noisy audio-visual data, outperforming existing methods.

Contribution

The paper proposes a novel contrastive loss modification using a negative margin to handle noisy correspondences in sound localization tasks.

Findings

01

MarginNCE achieves on-par or better performance than state-of-the-art methods.

02

Introducing a negative margin consistently improves existing contrastive approaches.

03

The approach effectively mitigates noise in audio-visual correspondence for localization.

Abstract

The goal of this work is to localize sound sources in visual scenes with a self-supervised approach. Contrastive learning in the context of sound source localization leverages the natural correspondence between audio and visual signals where the audio-visual pairs from the same source are assumed as positive, while randomly selected pairs are negatives. However, this approach brings in noisy correspondences; for example, positive audio and visual pair signals that may be unrelated to each other, or negative pairs that may contain semantically similar samples to the positive one. Our key contribution in this work is to show that using a less strict decision boundary in contrastive learning can alleviate the effect of noisy correspondences in sound source localization. We propose a simple yet effective approach by slightly modifying the contrastive loss with a negative margin. Extensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Hearing Loss and Rehabilitation

MethodsContrastive Learning