Unseen but not Unknown: Using Dataset Concealment to Robustly Evaluate Speech Quality Estimation Models

Jaden Pieper; Stephen D. Voran

arXiv:2601.21110·eess.AS·January 30, 2026

Unseen but not Unknown: Using Dataset Concealment to Robustly Evaluate Speech Quality Estimation Models

Jaden Pieper, Stephen D. Voran

PDF

Open Access

TL;DR

This paper presents Dataset Concealment (DSC), a new evaluation method for speech quality models that helps understand their generalization and dataset effects, demonstrated with multiple models and datasets.

Contribution

The introduction of DSC as a novel, interpretable evaluation framework for speech quality models, including dataset handling techniques like the Aligner to improve unseen data performance.

Findings

01

DSC effectively decomposes performance gaps between research and real-world data.

02

Using the Aligner dataset improves model generalization to unseen data.

03

Adding the Aligner dataset enhances the Wav2Vec2.0 model's speech quality estimation.

Abstract

We introduce Dataset Concealment (DSC), a rigorous new procedure for evaluating and interpreting objective speech quality estimation models. DSC quantifies and decomposes the performance gap between research results and real-world application requirements, while offering context and additional insights into model behavior and dataset characteristics. We also show the benefits of addressing the corpus effect by using the dataset Aligner from AlignNet when training models with multiple datasets. We demonstrate DSC and the improvements from the Aligner using nine training datasets and nine unseen datasets with three well-studied models: MOSNet, NISQA, and a Wav2Vec2.0-based model. DSC provides interpretable views of the generalization capabilities and limitations of models, while allowing all available data to be used at training. An additional result is that adding the 1000 parameter…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Image and Video Quality Assessment · Face recognition and analysis