Deep Learning Based Assessment of Synthetic Speech Naturalness

Gabriel Mittag; Sebastian M\"oller

arXiv:2104.11673·cs.SD·April 26, 2021

Deep Learning Based Assessment of Synthetic Speech Naturalness

Gabriel Mittag, Sebastian M\"oller

PDF

1 Repo 1 Datasets

TL;DR

This paper introduces a new deep learning model for objectively assessing the naturalness of synthetic speech, applicable across languages and trained on diverse datasets.

Contribution

It presents a novel CNN-LSTM based model for speech naturalness prediction, enhanced by transfer learning from speech quality models, and makes the tool publicly available.

Findings

01

Model trained on 16 datasets including Blizzard and Voice Conversion Challenges.

02

Transfer learning improves prediction reliability.

03

Model is language-independent and end-to-end.

Abstract

In this paper, we present a new objective prediction model for synthetic speech naturalness. It can be used to evaluate Text-To-Speech or Voice Conversion systems and works language independently. The model is trained end-to-end and based on a CNN-LSTM network that previously showed to give good results for speech quality estimation. We trained and tested the model on 16 different datasets, such as from the Blizzard Challenge and the Voice Conversion Challenge. Further, we show that the reliability of deep learning-based naturalness prediction can be improved by transfer learning from speech quality prediction models that are trained on objective POLQA scores. The proposed model is made publicly available and can, for example, be used to evaluate different TTS system configurations.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gabrielmittag/NISQA
pytorchOfficial

Datasets

hewliyang/nisqa-blizzard-challenge-mos
dataset· 17 dl
17 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.