Unveiling and Mitigating Bias in Audio Visual Segmentation
Peiwen Sun, Honggang Zhang, Di Hu

TL;DR
This paper investigates biases in audio-visual segmentation models, categorizes them using synthetic data, and proposes targeted methods to mitigate these biases, leading to improved model robustness and performance.
Contribution
It introduces a novel analysis of audio priming bias and visual prior in AVS models, and develops specific modules and training strategies to mitigate these biases without altering the model structure.
Findings
Biases significantly affect segmentation quality.
Proposed methods effectively reduce biases.
Achieved competitive results on AVS benchmarks.
Abstract
Community researchers have developed a range of advanced audio-visual segmentation models aimed at improving the quality of sounding objects' masks. While masks created by these models may initially appear plausible, they occasionally exhibit anomalies with incorrect grounding logic. We attribute this to real-world inherent preferences and distributions as a simpler signal for learning than the complex audio-visual grounding, which leads to the disregard of important modality information. Generally, the anomalous phenomena are often complex and cannot be directly observed systematically. In this study, we made a pioneering effort with the proper synthetic data to categorize and analyze phenomena as two types "audio priming bias" and "visual prior" according to the source of anomalies. For audio priming bias, to enhance audio sensitivity to different intensities and semantics, a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSparse Evolutionary Training
