Audio-Visual Segmentation
Jinxing Zhou, Jianyuan Wang, Jiayi Zhang, Weixuan Sun, Jing Zhang,, Stan Birchfield, Dan Guo, Lingpeng Kong, Meng Wang, Yiran Zhong

TL;DR
This paper introduces the novel task of audio-visual segmentation, creates a benchmark dataset, and proposes a new method that integrates audio and visual information for pixel-level object segmentation in videos.
Contribution
It defines the AVS problem, constructs the first AVS benchmark dataset, and proposes a novel audio-visual interaction method for segmentation.
Findings
The proposed method outperforms existing related methods.
The AVSBench dataset provides detailed pixel-wise annotations.
The approach effectively links audio semantics with visual segmentation.
Abstract
We propose to explore a new problem called audio-visual segmentation (AVS), in which the goal is to output a pixel-level map of the object(s) that produce sound at the time of the image frame. To facilitate this research, we construct the first audio-visual segmentation benchmark (AVSBench), providing pixel-wise annotations for the sounding objects in audible videos. Two settings are studied with this benchmark: 1) semi-supervised audio-visual segmentation with a single sound source and 2) fully-supervised audio-visual segmentation with multiple sound sources. To deal with the AVS problem, we propose a novel method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process. We also design a regularization loss to encourage the audio-visual mapping during training. Quantitative and qualitative experiments on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Hearing Loss and Rehabilitation
