Visual Scene Graphs for Audio Source Separation

Moitreya Chatterjee; Jonathan Le Roux; Narendra Ahuja; Anoop; Cherian

arXiv:2109.11955·cs.CV·September 27, 2021

Visual Scene Graphs for Audio Source Separation

Moitreya Chatterjee, Jonathan Le Roux, Narendra Ahuja, Anoop, Cherian

PDF

1 Repo

TL;DR

This paper introduces AVSGS, a deep learning model that uses visual scene graphs to improve audio source separation, especially in complex, real-world scenarios, achieving state-of-the-art results.

Contribution

The paper presents AVSGS, a novel graph-based deep learning approach that incorporates visual scene structure for enhanced audio source separation, including a new challenging dataset.

Findings

01

Achieves state-of-the-art separation performance on ASIW and MUSIC datasets.

02

Effectively models object interactions to distinguish similar sound sources.

03

Demonstrates robustness in natural, daily-life audio-visual scenes.

Abstract

State-of-the-art approaches for visually-guided audio source separation typically assume sources that have characteristic sounds, such as musical instruments. These approaches often ignore the visual context of these sound sources or avoid modeling object interactions that may be useful to better characterize the sources, especially when the same object class may produce varied sounds from distinct interactions. To address this challenging problem, we propose Audio Visual Scene Graph Segmenter (AVSGS), a novel deep learning model that embeds the visual structure of the scene as a graph and segments this graph into subgraphs, each subgraph being associated with a unique sound obtained by co-segmenting the audio spectrogram. At its core, AVSGS uses a recursive neural network that emits mutually-orthogonal sub-graph embeddings of the visual graph using multi-head attention. These…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

merlresearch/AVSGS
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.