Selective Noise Suppression and Discriminative Mutual Interaction for Robust Audio-Visual Segmentation

Kai Peng; Yunzhe Shen; Miao Zhang; Leiye Liu; Yidong Han; Wei Ji; Jingjing Li; Yongri Piao; and Huchuan Lu

arXiv:2603.14203·cs.CV·March 25, 2026

Selective Noise Suppression and Discriminative Mutual Interaction for Robust Audio-Visual Segmentation

Kai Peng, Yunzhe Shen, Miao Zhang, Leiye Liu, Yidong Han, Wei Ji, Jingjing Li, Yongri Piao, and Huchuan Lu

PDF

Open Access

TL;DR

This paper introduces SDAVS, a novel approach for robust audio-visual segmentation that effectively suppresses noise and enhances the interaction between audio and visual data, leading to improved performance in complex scenes.

Contribution

The paper proposes the SNRP and DAMF modules, which together improve noise suppression and discriminative interaction in AVS models, a novel combination for this task.

Findings

01

Achieves state-of-the-art results on AVS benchmarks.

02

Effective noise suppression in multi-source scenes.

03

Enhanced audio-visual representation consistency.

Abstract

The ability to capture and segment sounding objects in dynamic visual scenes is crucial for the development of Audio-Visual Segmentation (AVS) tasks. While significant progress has been made in this area, the interaction between audio and visual modalities still requires further exploration. In this work, we aim to answer the following questions: How can a model effectively suppress audio noise while enhancing relevant audio information? How can we achieve discriminative interaction between the audio and visual modalities? To this end, we propose SDAVS, equipped with the Selective Noise-Resilient Processor (SNRP) module and the Discriminative Audio-Visual Mutual Fusion (DAMF) strategy. The proposed SNRP mitigates audio noise interference by selectively emphasizing relevant auditory cues, while DAMF ensures more consistent audio-visual representations. Experimental results demonstrate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Hearing Loss and Rehabilitation