Partial Rank Similarity Minimization Method for Quality MOS Prediction   of Unseen Speech Synthesis Systems in Zero-Shot and Semi-supervised setting

Hemant Yadav; Erica Cooper; Junichi Yamagishi; Sunayana Sitaram; Rajiv; Ratn Shah

arXiv:2310.05078·eess.AS·October 10, 2023

Partial Rank Similarity Minimization Method for Quality MOS Prediction of Unseen Speech Synthesis Systems in Zero-Shot and Semi-supervised setting

Hemant Yadav, Erica Cooper, Junichi Yamagishi, Sunayana Sitaram, Rajiv, Ratn Shah

PDF

Open Access 1 Repo

TL;DR

This paper proposes a partial rank similarity (PRS) loss function for predicting speech quality scores, which outperforms traditional methods in zero-shot and semi-supervised scenarios by focusing on relative ranking rather than absolute MOS values.

Contribution

The introduction of the PRS loss function that emphasizes rank order in MOS prediction models for unseen speech synthesis systems, improving zero-shot and semi-supervised performance.

Findings

01

PRS outperforms L1 loss in correlation with ground truth

02

Rank order consideration improves MOS prediction robustness

03

MSE and linear correlation may be unreliable metrics

Abstract

This paper introduces a novel objective function for quality mean opinion score (MOS) prediction of unseen speech synthesis systems. The proposed function measures the similarity of relative positions of predicted MOS values, in a mini-batch, rather than the actual MOS values. That is the partial rank similarity is measured (PRS) rather than the individual MOS values as with the L1 loss. Our experiments on out-of-domain speech synthesis systems demonstrate that the PRS outperforms L1 loss in zero-shot and semi-supervised settings, exhibiting stronger correlation with ground truth. These findings highlight the importance of considering rank order, as done by PRS, when training MOS prediction models. We also argue that mean squared error and linear correlation coefficient metrics may be unreliable for evaluating MOS prediction models. In conclusion, PRS-trained models provide a robust…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nii-yamagishilab/partial_rank_similarity
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing