Improved Remixing Process for Domain Adaptation-Based Speech Enhancement by Mitigating Data Imbalance in Signal-to-Noise Ratio
Li Li, Shogo Seki

TL;DR
This paper improves domain adaptation-based speech enhancement by addressing data imbalance in signal-to-noise ratio (SNR) through curriculum learning, leading to better performance on underrepresented acoustic conditions.
Contribution
It demonstrates the impact of SNR imbalance on speech enhancement and proposes a curriculum learning approach to mitigate this issue in domain adaptation methods.
Findings
SNR imbalance significantly affects model performance.
Curriculum learning improves enhancement for underrepresented SNRs.
Empirical evidence on CHiME-7 dataset supports the approach.
Abstract
RemixIT and Remixed2Remixed are domain adaptation-based speech enhancement (DASE) methods that use a teacher model trained in full supervision to generate pseudo-paired data by remixing the outputs of the teacher model. The student model for enhancing real-world recorded signals is trained using the pseudo-paired data without ground truth. Since the noisy signals are recorded in natural environments, the dataset inevitably suffers data imbalance in some acoustic properties, leading to subpar performance for the underrepresented data. The signal-to-noise ratio (SNR), inherently balanced in supervised learning, is a prime example. In this paper, we provide empirical evidence that the SNR of pseudo data has a significant impact on model performance using the dataset of the CHiME-7 UDASE task, highlighting the importance of balanced SNR in DASE. Furthermore, we propose adopting curriculum…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing
