BAVS: Bootstrapping Audio-Visual Segmentation by Integrating Foundation Knowledge
Chen Liu, Peike Li, Hu Zhang, Lincheng Li, Zi Huang, Dadong Wang, and, Xin Yu

TL;DR
This paper introduces BAVS, a two-stage framework that improves audio-visual segmentation by integrating foundation knowledge to effectively distinguish real sound sources from background noise and off-screen sounds.
Contribution
The proposed BAVS method explicitly models audio-visual correspondences and incorporates a hierarchical semantic integration strategy to enhance sound source localization in noisy environments.
Findings
Outperforms existing methods on AVS datasets
Effectively handles background noise and off-screen sounds
Improves sound localization accuracy in real-world scenarios
Abstract
Given an audio-visual pair, audio-visual segmentation (AVS) aims to locate sounding sources by predicting pixel-wise maps. Previous methods assume that each sound component in an audio signal always has a visual counterpart in the image. However, this assumption overlooks that off-screen sounds and background noise often contaminate the audio recordings in real-world scenarios. They impose significant challenges on building a consistent semantic mapping between audio and visual signals for AVS models and thus impede precise sound localization. In this work, we propose a two-stage bootstrapping audio-visual segmentation framework by incorporating multi-modal foundation knowledge. In a nutshell, our BAVS is designed to eliminate the interference of background noise or off-screen sounds in segmentation by establishing the audio-visual correspondences in an explicit manner. In the first…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Hearing Loss and Rehabilitation
