Improved Remixing Process for Domain Adaptation-Based Speech Enhancement   by Mitigating Data Imbalance in Signal-to-Noise Ratio

Li Li; Shogo Seki

arXiv:2406.13982·cs.SD·June 21, 2024

Improved Remixing Process for Domain Adaptation-Based Speech Enhancement by Mitigating Data Imbalance in Signal-to-Noise Ratio

Li Li, Shogo Seki

PDF

Open Access

TL;DR

This paper improves domain adaptation-based speech enhancement by addressing data imbalance in signal-to-noise ratio (SNR) through curriculum learning, leading to better performance on underrepresented acoustic conditions.

Contribution

It demonstrates the impact of SNR imbalance on speech enhancement and proposes a curriculum learning approach to mitigate this issue in domain adaptation methods.

Findings

01

SNR imbalance significantly affects model performance.

02

Curriculum learning improves enhancement for underrepresented SNRs.

03

Empirical evidence on CHiME-7 dataset supports the approach.

Abstract

RemixIT and Remixed2Remixed are domain adaptation-based speech enhancement (DASE) methods that use a teacher model trained in full supervision to generate pseudo-paired data by remixing the outputs of the teacher model. The student model for enhancing real-world recorded signals is trained using the pseudo-paired data without ground truth. Since the noisy signals are recorded in natural environments, the dataset inevitably suffers data imbalance in some acoustic properties, leading to subpar performance for the underrepresented data. The signal-to-noise ratio (SNR), inherently balanced in supervised learning, is a prime example. In this paper, we provide empirical evidence that the SNR of pseudo data has a significant impact on model performance using the dataset of the CHiME-7 UDASE task, highlighting the importance of balanced SNR in DASE. Furthermore, we propose adopting curriculum…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing