Evaluating Speech Synthesis by Training Recognizers on Synthetic Speech

Dareen Alharthi; Roshan Sharma; Hira Dhamyal; Soumi Maiti; Bhiksha; Raj; Rita Singh

arXiv:2310.00706·cs.CL·October 3, 2023

Evaluating Speech Synthesis by Training Recognizers on Synthetic Speech

Dareen Alharthi, Roshan Sharma, Hira Dhamyal, Soumi Maiti, Bhiksha, Raj, Rita Singh

PDF

Open Access 1 Repo

TL;DR

This paper proposes a novel evaluation method for synthetic speech by training speech recognition models on synthetic data and testing on real speech, providing a broader quality assessment beyond traditional intelligibility metrics.

Contribution

It introduces a new evaluation metric based on training ASR models on synthetic speech and testing on real speech, correlating well with human judgments.

Findings

01

The proposed metric correlates strongly with MOS scores.

02

It outperforms existing automatic metrics like SpeechLMScore and MOSNet.

03

The method is validated on three recent TTS systems.

Abstract

Modern speech synthesis systems have improved significantly, with synthetic speech being indistinguishable from real speech. However, efficient and holistic evaluation of synthetic speech still remains a significant challenge. Human evaluation using Mean Opinion Score (MOS) is ideal, but inefficient due to high costs. Therefore, researchers have developed auxiliary automatic metrics like Word Error Rate (WER) to measure intelligibility. Prior works focus on evaluating synthetic speech based on pre-trained speech recognition models, however, this can be limiting since this approach primarily measures speech intelligibility. In this paper, we propose an evaluation technique involving the training of an ASR model on synthetic speech and assessing its performance on real speech. Our main assumption is that by training the ASR model on the synthetic speech, the WER on real speech reflects…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

soham97/pam
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Topic Modeling · Speech and dialogue systems

MethodsFocus