Improving Audio-Visual Segmentation with Bidirectional Generation

Dawei Hao; Yuxin Mao; Bowen He; Xiaodong Han; Yuchao Dai; Yiran Zhong

arXiv:2308.08288·cs.CV·December 20, 2023·1 cites

Improving Audio-Visual Segmentation with Bidirectional Generation

Dawei Hao, Yuxin Mao, Bowen He, Xiaodong Han, Yuchao Dai, Yiran Zhong

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a bidirectional generation framework for audio-visual segmentation that models the correlation between visual features and sounds, improving accuracy especially in complex scenes with multiple sound sources.

Contribution

It proposes a novel bidirectional generation approach with visual-to-audio projection and volumetric motion estimation, advancing AVS performance on benchmark datasets.

Findings

01

Achieved new state-of-the-art results on AVSBench, especially in MS3 subset.

02

Demonstrated the effectiveness of bidirectional modeling in AVS.

03

Enhanced handling of temporal dynamics with motion estimation module.

Abstract

The aim of audio-visual segmentation (AVS) is to precisely differentiate audible objects within videos down to the pixel level. Traditional approaches often tackle this challenge by combining information from various modalities, where the contribution of each modality is implicitly or explicitly modeled. Nevertheless, the interconnections between different modalities tend to be overlooked in audio-visual modeling. In this paper, inspired by the human ability to mentally simulate the sound of an object and its visual appearance, we introduce a bidirectional generation framework. This framework establishes robust correlations between an object's visual characteristics and its associated sound, thereby enhancing the performance of AVS. To achieve this, we employ a visual-to-audio projection component that reconstructs audio features from object segmentation masks and minimizes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

opennlplab/avs-bidirectional
pytorchOfficial

Videos

Improving Audio-Visual Segmentation with Bidirectional Generation· underline

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Image Enhancement Techniques