Why We Should Report the Details in Subjective Evaluation of TTS More Rigorously
Cheng-Han Chiang, Wei-Ping Huang, Hung-yi Lee

TL;DR
This paper highlights the critical need for detailed reporting in subjective speech synthesis evaluations, demonstrating how such details influence results and proposing more rigorous reporting standards to enhance reliability.
Contribution
It provides an analysis of current reporting deficiencies and empirically shows how experiment details affect TTS evaluation outcomes, advocating for improved transparency.
Findings
Evaluation outcomes vary significantly with different experimental details
Current reporting practices are often incomplete or inconsistent
More rigorous reporting can improve the reliability of subjective evaluations
Abstract
This paper emphasizes the importance of reporting experiment details in subjective evaluations and demonstrates how such details can significantly impact evaluation results in the field of speech synthesis. Through an analysis of 80 papers presented at INTERSPEECH 2022, we find a lack of thorough reporting on critical details such as evaluator recruitment and filtering, instructions and payments, and the geographic and linguistic backgrounds of evaluators. To illustrate the effect of these details on evaluation outcomes, we conducted mean opinion score (MOS) tests on three well-known TTS systems under different evaluation settings and we obtain at least three distinct rankings of TTS models. We urge the community to report experiment details in subjective evaluations to improve the reliability and interpretability of experimental results.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and dialogue systems
