URGENT-PK: Perceptually-Aligned Ranking Model Designed for Speech Enhancement Competition

Jiahe Wang; Chenda Li; Wei Wang; Wangyou Zhang; Samuele Cornell; Marvin Sach; Robin Scheibler; Kohei Saijo; Yihui Fu; Zhaoheng Ni; Anurag Kumar; Tim Fingscheidt; Shinji Watanabe; Yanmin Qian

arXiv:2506.23874·eess.AS·July 1, 2025

URGENT-PK: Perceptually-Aligned Ranking Model Designed for Speech Enhancement Competition

Jiahe Wang, Chenda Li, Wei Wang, Wangyou Zhang, Samuele Cornell, Marvin Sach, Robin Scheibler, Kohei Saijo, Yihui Fu, Zhaoheng Ni, Anurag Kumar, Tim Fingscheidt, Shinji Watanabe, Yanmin Qian

PDF

Open Access

TL;DR

URGENT-PK is a perceptually-aligned ranking model for speech enhancement that uses pairwise comparisons to reliably rank system quality, outperforming existing methods with limited training data.

Contribution

It introduces a novel pairwise ranking approach for speech quality assessment that effectively utilizes limited data and improves system ranking accuracy.

Findings

01

Outperforms state-of-the-art baselines in system ranking

02

Efficiently utilizes limited training data

03

Simple network architecture achieves superior results

Abstract

The Mean Opinion Score (MOS) is fundamental to speech quality assessment. However, its acquisition requires significant human annotation. Although deep neural network approaches, such as DNSMOS and UTMOS, have been developed to predict MOS to avoid this issue, they often suffer from insufficient training data. Recognizing that the comparison of speech enhancement (SE) systems prioritizes a reliable system comparison over absolute scores, we propose URGENT-PK, a novel ranking approach leveraging pairwise comparisons. URGENT-PK takes homologous enhanced speech pairs as input to predict relative quality rankings. This pairwise paradigm efficiently utilizes limited training data, as all pairwise permutations of multiple systems constitute a training instance. Experiments across multiple open test sets demonstrate URGENT-PK's superior system-level ranking performance over state-of-the-art…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Image and Video Quality Assessment