Few-Shot and Pseudo-Label Guided Speech Quality Evaluation with Large Language Models

Ryandhimas E. Zezario; Dyah A. M. G. Wisnu; Szu-Wei Fu; Sabato Marco Siniscalchi; Hsin-Min Wang; Yu Tsao

arXiv:2604.13528·eess.AS·April 16, 2026

Few-Shot and Pseudo-Label Guided Speech Quality Evaluation with Large Language Models

Ryandhimas E. Zezario, Dyah A. M. G. Wisnu, Szu-Wei Fu, Sabato Marco Siniscalchi, Hsin-Min Wang, Yu Tsao

PDF

TL;DR

GatherMOS introduces a large language model-based framework for speech quality evaluation, effectively aggregating diverse signals and outperforming existing methods, especially with limited labeled data.

Contribution

The paper presents GatherMOS, a novel LLM-based framework that combines acoustic descriptors and pseudo-labels for improved speech quality assessment.

Findings

01

GatherMOS outperforms DNSMOS, VQScore, and other models on VoiceBank-DEMAND.

02

Zero-shot GatherMOS maintains stable performance across conditions.

03

Few-shot guidance significantly improves results with matching support samples.

Abstract

In this paper, we introduce GatherMOS, a novel framework that leverages large language models (LLM) as meta-evaluators to aggregate diverse signals into quality predictions. GatherMOS integrates lightweight acoustic descriptors with pseudo-labels from DNSMOS and VQScore, enabling the LLM to reason over heterogeneous inputs and infer perceptual mean opinion scores (MOS). We further explore both zero-shot and few-shot in-context learning setups, showing that zero-shot GatherMOS maintains stable performance across diverse conditions, while few-shot guidance yields large gains when support samples match the test conditions. Experiments on the VoiceBank-DEMAND dataset demonstrate that GatherMOS consistently outperforms DNSMOS, VQScore, naive score averaging, and even learning-based models such as CNN-BLSTM and MOS-SSL when trained under limited labeled-data conditions. These results…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.