Leveraging LLMs for Scalable Non-intrusive Speech Quality Assessment

Fredrik Cumlin; Xinyu Liang; Anubhab Ghosh; Saikat Chatterjee

arXiv:2508.06284·eess.AS·August 11, 2025

Leveraging LLMs for Scalable Non-intrusive Speech Quality Assessment

Fredrik Cumlin, Xinyu Liang, Anubhab Ghosh, Saikat Chatterjee

PDF

Open Access

TL;DR

This paper explores using large language models as pseudo-raters to generate labeled data for training speech quality assessment systems, aiming to overcome data scarcity and improve generalization across diverse datasets.

Contribution

It introduces a novel two-stage training approach leveraging LLM-generated labels, enhancing speech quality assessment performance and scalability.

Findings

01

Two-stage training improves correlation with human ratings.

02

LLM-labeled data can supplement limited human annotations.

03

The approach enhances generalization across datasets and languages.

Abstract

Non-intrusive speech quality assessment (SQA) systems suffer from limited training data and costly human annotations, hindering their generalization to real-time conferencing calls. In this work, we propose leveraging large language models (LLMs) as pseudo-raters for speech quality to address these data bottlenecks. We construct LibriAugmented, a dataset consisting of 101,129 speech clips with simulated degradations labeled by a fine-tuned auditory LLM (Vicuna-7b-v1.5). We compare three training strategies: using human-labeled data, using LLM-labeled data, and a two-stage approach (pretraining on LLM labels, then fine-tuning on human labels), using both DNSMOS Pro and DeePMOS. We test on several datasets across languages and quality degradations. While LLM-labeled training yields mixed results compared to human-labeled training, we provide empirical evidence that the two-stage approach…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Advanced Data Compression Techniques