SP-MCQA: Evaluating Intelligibility of TTS Beyond the Word Level

Hitomi Jin Ling Tee; Chaoren Wang; Zijie Zhang; Zhizheng Wu

arXiv:2510.26190·cs.SD·October 31, 2025

SP-MCQA: Evaluating Intelligibility of TTS Beyond the Word Level

Hitomi Jin Ling Tee, Chaoren Wang, Zijie Zhang, Zhizheng Wu

PDF

TL;DR

This paper introduces a new subjective evaluation method called SP-MCQA for assessing TTS intelligibility beyond word accuracy, revealing gaps in current metrics and highlighting the need for more realistic evaluation standards.

Contribution

It proposes SP-MCQA as a novel evaluation approach and provides a benchmark dataset, exposing limitations of traditional metrics like WER in capturing true speech intelligibility.

Findings

01

Low WER does not ensure high key-information accuracy.

02

State-of-the-art models lack robust text normalization and phonetic accuracy.

03

Traditional metrics may not reflect real-world speech comprehension.

Abstract

The evaluation of intelligibility for TTS has reached a bottleneck, as existing assessments heavily rely on word-by-word accuracy metrics such as WER, which fail to capture the complexity of real-world speech or reflect human comprehension needs. To address this, we propose Spoken-Passage Multiple-Choice Question Answering, a novel subjective approach evaluating the accuracy of key information in synthesized speech, and release SP-MCQA-Eval, an 8.76-hour news-style benchmark dataset for SP-MCQA evaluation. Our experiments reveal that low WER does not necessarily guarantee high key-information accuracy, exposing a gap between traditional metrics and practical intelligibility. SP-MCQA shows that even state-of-the-art (SOTA) models still lack robust text normalization and phonetic accuracy. This work underscores the urgent need for high-level, more life-like evaluation criteria now that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.