Preference-based training framework for automatic speech quality   assessment using deep neural network

Cheng-Hung Hu; Yusuke Yasuda; Tomoki Toda

arXiv:2308.15203·eess.AS·August 30, 2023·Interspeech

Preference-based training framework for automatic speech quality assessment using deep neural network

Cheng-Hung Hu, Yusuke Yasuda, Tomoki Toda

PDF

TL;DR

This paper introduces a preference-based training framework for speech quality assessment using deep neural networks, focusing on ranking synthetic speech systems more effectively than traditional score-based methods.

Contribution

It proposes a novel training framework that leverages preference scores from MOS pairs to improve system ranking accuracy in speech quality assessment.

Findings

01

Framework outperforms baseline in Spearman's Rank Correlation

02

Effective pair generation and aggregation functions identified

03

Conditions for optimal framework performance analyzed

Abstract

One objective of Speech Quality Assessment (SQA) is to estimate the ranks of synthetic speech systems. However, recent SQA models are typically trained using low-precision direct scores such as mean opinion scores (MOS) as the training objective, which is not straightforward to estimate ranking. Although it is effective for predicting quality scores of individual sentences, this approach does not account for speech and system preferences when ranking multiple systems. We propose a training framework of SQA models that can be trained with only preference scores derived from pairs of MOS to improve ranking prediction. Our experiment reveals conditions where our framework works the best in terms of pair generation, aggregation functions to derive system score from utterance preferences, and threshold functions to determine preference from a pair of MOS. Our results demonstrate that our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.