Assessing the Generalization Gap of Learning-Based Speech Enhancement   Systems in Noisy and Reverberant Environments

Philippe Gonzalez; Tommy Sonne Alstr{\o}m; Tobias May

arXiv:2309.06183·eess.AS·November 9, 2023

Assessing the Generalization Gap of Learning-Based Speech Enhancement Systems in Noisy and Reverberant Environments

Philippe Gonzalez, Tommy Sonne Alstr{\o}m, Tobias May

PDF

Open Access

TL;DR

This paper introduces a framework to measure how well speech enhancement systems generalize across different noisy and reverberant environments, revealing that performance often drops in unseen conditions, especially with speech mismatches.

Contribution

The study proposes a novel generalization assessment framework using a reference model to quantify the generalization gap in speech enhancement systems across diverse conditions.

Findings

01

Performance degrades most with speech mismatches.

02

Training on multiple databases improves noise and room generalization.

03

Recent models perform worse than simpler systems in mismatched conditions.

Abstract

The acoustic variability of noisy and reverberant speech mixtures is influenced by multiple factors, such as the spectro-temporal characteristics of the target speaker and the interfering noise, the signal-to-noise ratio (SNR) and the room characteristics. This large variability poses a major challenge for learning-based speech enhancement systems, since a mismatch between the training and testing conditions can substantially reduce the performance of the system. Generalization to unseen conditions is typically assessed by testing the system with a new speech, noise or binaural room impulse response (BRIR) database different from the one used during training. However, the difficulty of the speech enhancement task can change across databases, which can substantially influence the results. The present study introduces a generalization assessment framework that uses a reference model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Underwater Acoustics Research