Quality Assessment of Noisy and Enhanced Speech with Limited Data: UWB-NTIS System for VoiceMOS 2024

Marie Kune\v{s}ov\'a; Ale\v{s} Pra\v{z}\'ak; Jan Lehe\v{c}ka

arXiv:2506.00506·eess.AS·April 28, 2026

Quality Assessment of Noisy and Enhanced Speech with Limited Data: UWB-NTIS System for VoiceMOS 2024

Marie Kune\v{s}ov\'a, Ale\v{s} Pra\v{z}\'ak, Jan Lehe\v{c}ka

PDF

TL;DR

This paper introduces a transfer learning-based system for non-intrusive speech quality prediction in noisy and enhanced speech, achieving top performance in the VoiceMOS 2024 Challenge with limited labeled data.

Contribution

It proposes a novel two-stage transfer learning approach using wav2vec 2.0 and data augmentation to improve speech quality prediction under severe data constraints.

Findings

01

Achieved best BAK prediction with LCC=0.867

02

Second place in OVRL with LCC=0.711

03

Artificial data augmentation significantly improved SIG prediction from 0.207 to 0.516

Abstract

We present a system for non-intrusive prediction of speech quality in noisy and enhanced speech, developed for Track 3 of the VoiceMOS 2024 Challenge. The task required estimating the ITU-T P.835 metrics SIG, BAK, and OVRL without reference signals and with only 100 subjectively labeled utterances for training. Our approach uses wav2vec 2.0 with a two-stage transfer learning strategy: initial fine-tuning on automatically labeled noisy data, followed by adaptation to the challenge data. The system achieved the best performance on BAK prediction (LCC=0.867) and a very close second place in OVRL (LCC=0.711) in the official evaluation. Post-challenge experiments show that adding artificially degraded data to the first fine-tuning stage substantially improves SIG prediction, raising correlation with ground truth scores from 0.207 to 0.516. These results demonstrate that transfer learning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.