Learning Sound Localization Better From Semantically Similar Samples

Arda Senocak; Hyeonggon Ryu; Junsik Kim; In So Kweon

arXiv:2202.03007·cs.CV·February 8, 2022

Learning Sound Localization Better From Semantically Similar Samples

Arda Senocak, Hyeonggon Ryu, Junsik Kim, In So Kweon

PDF

Open Access

TL;DR

This paper improves sound source localization in visual scenes by incorporating semantically similar samples as positives in contrastive learning, leading to better response map similarity and enhanced performance.

Contribution

It introduces a novel method that leverages semantically similar pairs as positives, addressing the issue of hard negatives in contrastive learning for sound localization.

Findings

01

Effective on VGG-SS and SoundNet-Flickr datasets

02

Outperforms state-of-the-art methods

03

Enhances response map similarity for semantically related pairs

Abstract

The objective of this work is to localize the sound sources in visual scenes. Existing audio-visual works employ contrastive learning by assigning corresponding audio-visual pairs from the same source as positives while randomly mismatched pairs as negatives. However, these negative pairs may contain semantically matched audio-visual information. Thus, these semantically correlated pairs, "hard positives", are mistakenly grouped as negatives. Our key contribution is showing that hard positives can give similar response maps to the corresponding pairs. Our approach incorporates these hard positives by adding their response maps into a contrastive learning objective directly. We demonstrate the effectiveness of our approach on VGG-SS and SoundNet-Flickr test sets, showing favorable performance to the state-of-the-art methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Hearing Loss and Rehabilitation

MethodsContrastive Learning