Uni-VERSA: Versatile Speech Assessment with a Unified Network

Jiatong Shi; Hye-Jin Shim; Shinji Watanabe

arXiv:2505.20741·cs.SD·May 28, 2025

Uni-VERSA: Versatile Speech Assessment with a Unified Network

Jiatong Shi, Hye-Jin Shim, Shinji Watanabe

PDF

Open Access

TL;DR

Uni-VERSA is a unified neural network that predicts multiple speech quality metrics simultaneously, offering a comprehensive, scalable, and human-aligned alternative to traditional subjective listening tests.

Contribution

It introduces a novel multi-task framework for speech assessment that covers various quality aspects in a single model, improving efficiency and consistency.

Findings

01

Outperforms single-metric methods on the URGENT24 benchmark

02

Aligns closely with human perception of speech quality

03

Demonstrates versatility across speech enhancement and synthesis tasks

Abstract

Subjective listening tests remain the golden standard for speech quality assessment, but are costly, variable, and difficult to scale. In contrast, existing objective metrics, such as PESQ, F0 correlation, and DNSMOS, typically capture only specific aspects of speech quality. To address these limitations, we introduce Uni-VERSA, a unified network that simultaneously predicts various objective metrics, encompassing naturalness, intelligibility, speaker characteristics, prosody, and noise, for a comprehensive evaluation of speech signals. We formalize its framework, evaluation protocol, and applications in speech enhancement, synthesis, and quality control. A benchmark based on the URGENT24 challenge, along with a baseline leveraging self-supervised representations, demonstrates that Uni-VERSA provides a viable alternative to single-aspect evaluation methods. Moreover, it aligns closely…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Phonetics and Phonology Research