Enhancing Sound Source Localization via False Negative Elimination

Zengjie Song; Jiangshe Zhang; Yuxi Wang; Junsong Fan; Zhaoxiang Zhang

arXiv:2408.16448·cs.CV·August 30, 2024

Enhancing Sound Source Localization via False Negative Elimination

Zengjie Song, Jiangshe Zhang, Yuxi Wang, Junsong Fan, Zhaoxiang Zhang

PDF

1 Repo

TL;DR

This paper introduces a novel audio-visual learning framework that effectively eliminates false negatives in sound source localization, improving performance in localization, event classification, and object detection tasks.

Contribution

It proposes two complementary schemes, SSPL and SACL, to address false negatives in contrastive learning, enhancing audio-visual feature alignment and robustness.

Findings

01

Outperforms state-of-the-art methods in sound source localization

02

Improves accuracy in audio-visual event classification

03

Enhances object detection performance

Abstract

Sound source localization aims to localize objects emitting the sound in visual scenes. Recent works obtaining impressive results typically rely on contrastive learning. However, the common practice of randomly sampling negatives in prior arts can lead to the false negative issue, where the sounds semantically similar to visual instance are sampled as negatives and incorrectly pushed away from the visual anchor/query. As a result, this misalignment of audio and visual features could yield inferior performance. To address this issue, we propose a novel audio-visual learning framework which is instantiated with two individual learning schemes: self-supervised predictive learning (SSPL) and semantic-aware contrastive learning (SACL). SSPL explores image-audio positive pairs alone to discover semantically coherent similarities between audio and visual features, while a predictive coding…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zjsong/sacl
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsContrastive Learning