Progressive Confident Masking Attention Network for Audio-Visual Segmentation
Yuxuan Wang, Jinchao Zhu, Feng Dong, Shuyue Zhu

TL;DR
The paper introduces PMCANet, a novel attention-based network for audio-visual segmentation that effectively integrates multimodal information with reduced computational costs, outperforming existing methods.
Contribution
It proposes a progressive confident masking attention mechanism and an efficient cross-attention module to improve AVS performance and resource efficiency.
Findings
Outperforms existing AVS methods in accuracy.
Uses less computational resources than prior approaches.
Effectively leverages multi-stage outputs for better segmentation.
Abstract
Audio and visual signals typically occur simultaneously, and humans possess an innate ability to correlate and synchronize information from these two modalities. Recently, a challenging problem known as Audio-Visual Segmentation (AVS) has emerged, intending to produce segmentation maps for sounding objects within a scene. However, the methods proposed so far have not sufficiently integrated audio and visual information, and the computational costs have been extremely high. Additionally, the outputs of different stages have not been fully utilized. To facilitate this research, we introduce a novel Progressive Confident Masking Attention Network (PMCANet). It leverages attention mechanisms to uncover the intrinsic correlations between audio signals and visual frames. Furthermore, we design an efficient and effective cross-attention module to enhance semantic perception by selecting query…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Digital Media Forensic Detection
MethodsSoftmax · Concatenated Skip Connection
