Induction Network: Audio-Visual Modality Gap-Bridging for   Self-Supervised Sound Source Localization

Tianyu Liu; Peng Zhang; Wei Huang; Yufei Zha; Tao You; Yanning Zhang

arXiv:2308.04767·cs.CV·August 10, 2023

Induction Network: Audio-Visual Modality Gap-Bridging for Self-Supervised Sound Source Localization

Tianyu Liu, Peng Zhang, Wei Huang, Yufei Zha, Tao You, Yanning Zhang

PDF

1 Repo

TL;DR

This paper introduces an Induction Network that effectively bridges the gap between audio and visual modalities for self-supervised sound source localization, improving alignment and robustness over previous contrastive learning methods.

Contribution

The proposed Induction Network decouples modality gradients and uses an induction vector to enhance cross-modal alignment, addressing heterogeneity issues in self-supervised learning.

Findings

01

Outperforms state-of-the-art methods on SoundNet-Flickr and VGG-Sound datasets.

02

Improves robustness with adaptive threshold selection.

03

Effectively aligns audio and visual modalities in challenging scenarios.

Abstract

Self-supervised sound source localization is usually challenged by the modality inconsistency. In recent studies, contrastive learning based strategies have shown promising to establish such a consistent correspondence between audio and sound sources in visual scenarios. Unfortunately, the insufficient attention to the heterogeneity influence in the different modality features still limits this scheme to be further improved, which also becomes the motivation of our work. In this study, an Induction Network is proposed to bridge the modality gap more effectively. By decoupling the gradients of visual and audio modalities, the discriminative visual representations of sound sources can be learned with the designed Induction Vector in a bootstrap manner, which also enables the audio modality to be aligned with the visual modality consistently. In addition to a visual weighted contrastive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tahy1/avin
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsContrastive Learning