Why We Should Report the Details in Subjective Evaluation of TTS More   Rigorously

Cheng-Han Chiang; Wei-Ping Huang; Hung-yi Lee

arXiv:2306.02044·eess.AS·June 6, 2023·Interspeech·1 cites

Why We Should Report the Details in Subjective Evaluation of TTS More Rigorously

Cheng-Han Chiang, Wei-Ping Huang, Hung-yi Lee

PDF

Open Access 1 Repo

TL;DR

This paper highlights the critical need for detailed reporting in subjective speech synthesis evaluations, demonstrating how such details influence results and proposing more rigorous reporting standards to enhance reliability.

Contribution

It provides an analysis of current reporting deficiencies and empirically shows how experiment details affect TTS evaluation outcomes, advocating for improved transparency.

Findings

01

Evaluation outcomes vary significantly with different experimental details

02

Current reporting practices are often incomplete or inconsistent

03

More rigorous reporting can improve the reliability of subjective evaluations

Abstract

This paper emphasizes the importance of reporting experiment details in subjective evaluations and demonstrates how such details can significantly impact evaluation results in the field of speech synthesis. Through an analysis of 80 papers presented at INTERSPEECH 2022, we find a lack of thorough reporting on critical details such as evaluator recruitment and filtering, instructions and payments, and the geographic and linguistic backgrounds of evaluators. To illustrate the effect of these details on evaluation outcomes, we conducted mean opinion score (MOS) tests on three well-known TTS systems under different evaluation settings and we obtain at least three distinct rankings of TTS models. We urge the community to report experiment details in subjective evaluations to improve the reliability and interpretability of experimental results.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

d223302/subjectiveevaluation
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and dialogue systems