URGENT Challenge: Universality, Robustness, and Generalizability For Speech Enhancement
Wangyou Zhang, Robin Scheibler, Kohei Saijo, Samuele Cornell, Chenda, Li, Zhaoheng Ni, Anurag Kumar, Jan Pirklbauer, Marvin Sach, Shinji Watanabe,, Tim Fingscheidt, Yanmin Qian

TL;DR
The URGENT challenge aims to advance universal speech enhancement by unifying various sub-tasks into a single framework, using diverse data and comprehensive metrics to evaluate robustness and generalizability.
Contribution
It introduces a new challenge and a unified framework for multiple speech enhancement sub-tasks, promoting research on universal, robust, and generalizable SE models.
Findings
Preliminary baseline experiments reveal insights into model performance.
Diverse evaluation data highlights robustness challenges.
Unified framework enables comprehensive comparison of SE approaches.
Abstract
The last decade has witnessed significant advancements in deep learning-based speech enhancement (SE). However, most existing SE research has limitations on the coverage of SE sub-tasks, data diversity and amount, and evaluation metrics. To fill this gap and promote research toward universal SE, we establish a new SE challenge, named URGENT, to focus on the universality, robustness, and generalizability of SE. We aim to extend the SE definition to cover different sub-tasks to explore the limits of SE models, starting from denoising, dereverberation, bandwidth extension, and declipping. A novel framework is proposed to unify all these sub-tasks in a single model, allowing the use of all existing SE approaches. We collected public speech and noise data from different domains to construct diverse evaluation data. Finally, we discuss the insights gained from our preliminary baseline…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Infant Health and Development · Advanced Adaptive Filtering Techniques
MethodsFocus
