URGENT Challenge: Universality, Robustness, and Generalizability For   Speech Enhancement

Wangyou Zhang; Robin Scheibler; Kohei Saijo; Samuele Cornell; Chenda; Li; Zhaoheng Ni; Anurag Kumar; Jan Pirklbauer; Marvin Sach; Shinji Watanabe,; Tim Fingscheidt; Yanmin Qian

arXiv:2406.04660·eess.AS·September 25, 2024

URGENT Challenge: Universality, Robustness, and Generalizability For Speech Enhancement

Wangyou Zhang, Robin Scheibler, Kohei Saijo, Samuele Cornell, Chenda, Li, Zhaoheng Ni, Anurag Kumar, Jan Pirklbauer, Marvin Sach, Shinji Watanabe,, Tim Fingscheidt, Yanmin Qian

PDF

Open Access

TL;DR

The URGENT challenge aims to advance universal speech enhancement by unifying various sub-tasks into a single framework, using diverse data and comprehensive metrics to evaluate robustness and generalizability.

Contribution

It introduces a new challenge and a unified framework for multiple speech enhancement sub-tasks, promoting research on universal, robust, and generalizable SE models.

Findings

01

Preliminary baseline experiments reveal insights into model performance.

02

Diverse evaluation data highlights robustness challenges.

03

Unified framework enables comprehensive comparison of SE approaches.

Abstract

The last decade has witnessed significant advancements in deep learning-based speech enhancement (SE). However, most existing SE research has limitations on the coverage of SE sub-tasks, data diversity and amount, and evaluation metrics. To fill this gap and promote research toward universal SE, we establish a new SE challenge, named URGENT, to focus on the universality, robustness, and generalizability of SE. We aim to extend the SE definition to cover different sub-tasks to explore the limits of SE models, starting from denoising, dereverberation, bandwidth extension, and declipping. A novel framework is proposed to unify all these sub-tasks in a single model, allowing the use of all existing SE approaches. We collected public speech and noise data from different domains to construct diverse evaluation data. Finally, we discuss the insights gained from our preliminary baseline…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Infant Health and Development · Advanced Adaptive Filtering Techniques

MethodsFocus