Lessons Learned from the URGENT 2024 Speech Enhancement Challenge

Wangyou Zhang; Kohei Saijo; Samuele Cornell; Robin Scheibler; Chenda Li; Zhaoheng Ni; Anurag Kumar; Marvin Sach; Wei Wang; Yihui Fu; Shinji Watanabe; Tim Fingscheidt; Yanmin Qian

arXiv:2506.01611·eess.AS·June 3, 2025

Lessons Learned from the URGENT 2024 Speech Enhancement Challenge

Wangyou Zhang, Kohei Saijo, Samuele Cornell, Robin Scheibler, Chenda Li, Zhaoheng Ni, Anurag Kumar, Marvin Sach, Wei Wang, Yihui Fu, Shinji Watanabe, Tim Fingscheidt, Yanmin Qian

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This paper analyzes the URGENT 2024 Speech Enhancement Challenge, focusing on data quality and evaluation metrics, to improve the development of more robust and generalizable speech enhancement systems.

Contribution

It provides an in-depth analysis of data cleaning issues and evaluation metrics, highlighting overlooked problems and proposing comprehensive evaluation strategies for speech enhancement.

Findings

01

Bandwidth mismatches and label noise affect data quality.

02

Current SE systems struggle with challenging conditions like noise and overlap.

03

Combining multiple metrics improves correlation with human judgment.

Abstract

The URGENT 2024 Challenge aims to foster speech enhancement (SE) techniques with great universality, robustness, and generalizability, featuring a broader task definition, large-scale multi-domain data, and comprehensive evaluation metrics. Nourished by the challenge outcomes, this paper presents an in-depth analysis of two key, yet understudied, issues in SE system development: data cleaning and evaluation metrics. We highlight several overlooked problems in traditional SE pipelines: (1) mismatches between declared and effective audio bandwidths, along with label noise even in various "high-quality" speech corpora; (2) lack of both effective SE systems to conquer the hardest conditions (e.g., speech overlap, strong noise / reverberation) and reliable measure of speech sample difficulty; (3) importance of combining multifaceted metrics for a comprehensive evaluation correlating well…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

urgent-challenge/urgent2024_analysis
pytorchOfficial

Datasets

urgent-challenge/urgent2024-sqa
dataset· 84 dl
84 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing