TL;DR
This paper introduces MOS-Bench, a comprehensive dataset collection for subjective speech quality assessment, highlighting challenges in out-of-domain generalization and evaluating simple data pooling as an effective solution.
Contribution
The paper presents MOS-Bench, a diverse dataset collection for SSQA, and systematically studies out-of-domain generalization, proposing data pooling as a practical approach.
Findings
Data pooling improves out-of-domain generalization.
Variation in training data enhances robustness.
Current models struggle with out-of-domain speech quality prediction.
Abstract
In this paper, we study the task of subjective speech quality assessment (SSQA), which refers to predicting the perceptual quality of speech. Owing to the development of deep neural network models, SSQA has greatly advanced and has been widely applied in scientific papers to evaluate speech generation systems. Nonetheless, the insufficient out-of-domain (OOD) generalization ability of current SSQA models is underexplored and often overlooked by researchers. To study this problem systematically, we present MOS-Bench, a diverse SSQA dataset collection that currently contains 8 training sets and 17 test sets. Through extensive experiments, we first highlight the OOD generalization challenges of existing models. We then evaluate the efficacy of multiple-dataset training, comparing straightforward data pooling against AlignNet, an existing domain-aware method. We demonstrate that pooling…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
