DDOS: A MOS Prediction Framework utilizing Domain Adaptive Pre-training   and Distribution of Opinion Scores

Wei-Cheng Tseng; Wei-Tsung Kao; Hung-yi Lee

arXiv:2204.03219·eess.AS·August 16, 2022

DDOS: A MOS Prediction Framework utilizing Domain Adaptive Pre-training and Distribution of Opinion Scores

Wei-Cheng Tseng, Wei-Tsung Kao, Hung-yi Lee

PDF

Open Access

TL;DR

This paper introduces DDOS, a new model for predicting speech synthesis quality scores that uses domain adaptive pre-training and opinion score distribution modeling, achieving superior results on benchmark datasets.

Contribution

The paper presents a novel MOS prediction framework that combines domain adaptive pre-training with opinion score distribution modeling, improving accuracy and transferability.

Findings

01

Outperforms previous models on BVCC dataset

02

Significantly improves zero-shot transfer on BC2019 dataset

03

Achieved second place in Interspeech 2022 VoiceMOS challenge

Abstract

Mean opinion score (MOS) is a typical subjective evaluation metric for speech synthesis systems. Since collecting MOS is time-consuming, it would be desirable if there are accurate MOS prediction models for automatic evaluation. In this work, we propose DDOS, a novel MOS prediction model. DDOS utilizes domain adaptive pre-training to further pre-train self-supervised learning models on synthetic speech. And a proposed module is added to model the opinion score distribution of each utterance. With the proposed components, DDOS outperforms previous works on BVCC dataset. And the zero shot transfer result on BC2019 dataset is significantly improved. DDOS also wins second place in Interspeech 2022 VoiceMOS challenge in terms of system-level score.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Topic Modeling