Interspeech 2025 URGENT Speech Enhancement Challenge

Kohei Saijo; Wangyou Zhang; Samuele Cornell; Robin Scheibler; Chenda Li; Zhaoheng Ni; Anurag Kumar; Marvin Sach; Yihui Fu; Wei Wang; Tim Fingscheidt; Shinji Watanabe

arXiv:2505.23212·eess.AS·June 3, 2025·Interspeech

Interspeech 2025 URGENT Speech Enhancement Challenge

Kohei Saijo, Wangyou Zhang, Samuele Cornell, Robin Scheibler, Chenda Li, Zhaoheng Ni, Anurag Kumar, Marvin Sach, Yihui Fu, Wei Wang, Tim Fingscheidt, Shinji Watanabe

PDF

Open Access 1 Datasets

TL;DR

The Interspeech 2025 URGENT Challenge advances universal speech enhancement by evaluating diverse distortion handling, data scalability, and noisy training, revealing insights into model preferences and language dependency.

Contribution

This paper introduces the second edition of the URGENT Challenge, focusing on broadening the scope of universal speech enhancement research with new evaluation aspects.

Findings

01

Hybrid and discriminative models perform well, with some generative approaches favored subjectively.

02

Generative models may exhibit language dependency, affecting universality.

03

Noisy training data can be effective for speech enhancement.

Abstract

There has been a growing effort to develop universal speech enhancement (SE) to handle inputs with various speech distortions and recording conditions. The URGENT Challenge series aims to foster such universal SE by embracing a broad range of distortion types, increasing data diversity, and incorporating extensive evaluation metrics. This work introduces the Interspeech 2025 URGENT Challenge, the second edition of the series, to explore several aspects that have received limited attention so far: language dependency, universality for more distortion types, data scalability, and the effectiveness of using noisy training data. We received 32 submissions, where the best system uses a discriminative model, while most other competitive ones are hybrid methods. Analysis reveals some key findings: (i) some generative or hybrid approaches are preferred in subjective evaluations over the top…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

urgent-challenge/urgent2025-sqa
dataset· 141 dl
141 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Face recognition and analysis

MethodsSoftmax · Attention Is All You Need