Unseen but not Unknown: Using Dataset Concealment to Robustly Evaluate Speech Quality Estimation Models
Jaden Pieper, Stephen D. Voran

TL;DR
This paper presents Dataset Concealment (DSC), a new evaluation method for speech quality models that helps understand their generalization and dataset effects, demonstrated with multiple models and datasets.
Contribution
The introduction of DSC as a novel, interpretable evaluation framework for speech quality models, including dataset handling techniques like the Aligner to improve unseen data performance.
Findings
DSC effectively decomposes performance gaps between research and real-world data.
Using the Aligner dataset improves model generalization to unseen data.
Adding the Aligner dataset enhances the Wav2Vec2.0 model's speech quality estimation.
Abstract
We introduce Dataset Concealment (DSC), a rigorous new procedure for evaluating and interpreting objective speech quality estimation models. DSC quantifies and decomposes the performance gap between research results and real-world application requirements, while offering context and additional insights into model behavior and dataset characteristics. We also show the benefits of addressing the corpus effect by using the dataset Aligner from AlignNet when training models with multiple datasets. We demonstrate DSC and the improvements from the Aligner using nine training datasets and nine unseen datasets with three well-studied models: MOSNet, NISQA, and a Wav2Vec2.0-based model. DSC provides interpretable views of the generalization capabilities and limitations of models, while allowing all available data to be used at training. An additional result is that adding the 1000 parameter…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Image and Video Quality Assessment · Face recognition and analysis
