MBNet: MOS Prediction for Synthesized Speech with Mean-Bias Network
Yichong Leng, Xu Tan, Sheng Zhao, Frank Soong, Xiang-Yang Li, Tao Qin

TL;DR
This paper introduces MBNet, a novel MOS prediction model that utilizes individual judge scores by modeling mean and bias components, leading to improved correlation with human judgments in synthesized speech quality assessment.
Contribution
MBNet is the first model to explicitly incorporate individual judge bias scores alongside mean scores for MOS prediction, enhancing the utilization of training data.
Findings
MBNet outperforms MOSNet baseline in system-level SRCC.
MBNet achieves 2.9% and 6.7% SRCC improvements on VCC 2018 and 2016 datasets.
Model effectively captures individual judge biases.
Abstract
Mean opinion score (MOS) is a popular subjective metric to assess the quality of synthesized speech, and usually involves multiple human judges to evaluate each speech utterance. To reduce the labor cost in MOS test, multiple methods have been proposed to automatically predict MOS scores. To our knowledge, for a speech utterance, all previous works only used the average of multiple scores from different judges as the training target and discarded the score of each individual judge, which did not well exploit the precious MOS training data. In this paper, we propose MBNet, a MOS predictor with a mean subnet and a bias subnet to better utilize every judge score in MOS datasets, where the mean subnet is used to predict the mean score of each utterance similar to that in previous works, and the bias subnet to predict the bias score (the difference between the mean score and each individual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Natural Language Processing Techniques
