Universal Preference-Score-based Pairwise Speech Quality Assessment
Yu-Fei Shi, Yang Ai, Zhen-Hua Ling

TL;DR
This paper introduces UPPSQA, a universal model for pairwise speech quality assessment that predicts preference scores by estimating individual MOS and aggregating them, outperforming baselines across various scenarios.
Contribution
The paper presents a novel universal preference-score-based model for speech quality assessment that effectively predicts preference scores and handles data scarcity.
Findings
UPPSQA outperforms baseline models in accuracy.
The model is effective across different data types and domains.
A new pairwise speech dataset was constructed for experiments.
Abstract
To compare the performance of two speech generation systems, one of the most effective approaches is estimating the preference score between their generated speech. This paper proposes a novel universal preference-score-based pairwise speech quality assessment (UPPSQA) model, aimed at predicting the preference score between paired speech samples to determine which one has better quality. The model first predicts the absolute mean opinion score (MOS) for the two speech samples separately, and then aggregates them into a relative preference score using a preference function. To address the scarcity of preference data, we also construct a new pairwise speech dataset based on a MOS dataset for experiments. Experimental results confirm that, whether in training scenarios with different data types and label conditions, or in both in-domain and out-of-domain test scenarios, the prediction…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing
