Assessing the Generalization Gap of Learning-Based Speech Enhancement Systems in Noisy and Reverberant Environments
Philippe Gonzalez, Tommy Sonne Alstr{\o}m, Tobias May

TL;DR
This paper introduces a framework to measure how well speech enhancement systems generalize across different noisy and reverberant environments, revealing that performance often drops in unseen conditions, especially with speech mismatches.
Contribution
The study proposes a novel generalization assessment framework using a reference model to quantify the generalization gap in speech enhancement systems across diverse conditions.
Findings
Performance degrades most with speech mismatches.
Training on multiple databases improves noise and room generalization.
Recent models perform worse than simpler systems in mismatched conditions.
Abstract
The acoustic variability of noisy and reverberant speech mixtures is influenced by multiple factors, such as the spectro-temporal characteristics of the target speaker and the interfering noise, the signal-to-noise ratio (SNR) and the room characteristics. This large variability poses a major challenge for learning-based speech enhancement systems, since a mismatch between the training and testing conditions can substantially reduce the performance of the system. Generalization to unseen conditions is typically assessed by testing the system with a new speech, noise or binaural room impulse response (BRIR) database different from the one used during training. However, the difficulty of the speech enhancement task can change across databases, which can substantially influence the results. The present study introduces a generalization assessment framework that uses a reference model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Underwater Acoustics Research
