Mix and Localize: Localizing Sound Sources in Mixtures

Xixi Hu; Ziyang Chen; Andrew Owens

arXiv:2211.15058·cs.CV·November 29, 2022

Mix and Localize: Localizing Sound Sources in Mixtures

Xixi Hu, Ziyang Chen, Andrew Owens

PDF

Open Access

TL;DR

This paper introduces a novel joint audio-visual localization method that groups and localizes multiple sound sources in scenes, outperforming existing self-supervised approaches.

Contribution

It proposes a unified framework using a contrastive random walk on a graph to simultaneously localize and associate multiple sounds with visual signals.

Findings

01

Successfully localizes multiple sounds in scenes.

02

Outperforms other self-supervised methods in experiments.

03

Works with musical instruments and speech.

Abstract

We present a method for simultaneously localizing multiple sound sources within a visual scene. This task requires a model to both group a sound mixture into individual sources, and to associate them with a visual signal. Our method jointly solves both tasks at once, using a formulation inspired by the contrastive random walk of Jabri et al. We create a graph in which images and separated sounds correspond to nodes, and train a random walker to transition between nodes from different modalities with high return probability. The transition probabilities for this walk are determined by an audio-visual similarity metric that is learned by our model. We show through experiments with musical instruments and human speech that our model can successfully localize multiple sounds, outperforming other self-supervised methods. Project site: https://hxixixh.github.io/mix-and-localize

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Music Technology and Sound Studies