Test-Time Adaptation For Speech Enhancement Via Mask Polarization
Tobias Raichle, Erfan Amini, Bin Yang

TL;DR
This paper introduces mask polarization (MPol), a lightweight test-time adaptation method that improves speech enhancement models' robustness to domain shifts by restoring mask bimodality without extra parameters.
Contribution
The paper proposes MPol, a novel, parameter-free TTA technique that enhances speech enhancement models' performance under unseen domain conditions.
Findings
MPol consistently improves speech enhancement across various domain shifts.
MPol is resource-efficient and suitable for edge deployment.
MPol achieves competitive results compared to more complex methods.
Abstract
Adapting speech enhancement (SE) models to unseen environments is crucial for practical deployments, yet test-time adaptation (TTA) for SE remains largely under-explored due to a lack of understanding of how SE models degrade under domain shifts. We observe that mask-based SE models lose confidence under domain shifts, with predicted masks becoming flattened and losing decisive speech preservation and noise suppression. Based on this insight, we propose mask polarization (MPol), a lightweight TTA method that restores mask bimodality through distribution comparison using the Wasserstein distance. MPol requires no additional parameters beyond the trained model, making it suitable for resource-constrained edge deployments. Experimental results across diverse domain shifts and architectures demonstrate that MPol achieves very consistent gains that are competitive with significantly more…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Hearing Loss and Rehabilitation · Speech Recognition and Synthesis
