A Dataset for Automatic Assessment of TTS Quality in Spanish

Alejandro Sosa Welford; Leonardo Pepino

arXiv:2507.01805·cs.SD·July 3, 2025

A Dataset for Automatic Assessment of TTS Quality in Spanish

Alejandro Sosa Welford, Leonardo Pepino

PDF

Open Access 1 Datasets

TL;DR

This paper introduces a novel Spanish TTS quality assessment dataset with 4,326 samples from 52 systems, validated through subjective testing and used to train models achieving high prediction accuracy.

Contribution

It provides the first comprehensive Spanish TTS quality dataset and demonstrates its utility for training effective naturalness prediction models.

Findings

01

Models achieved a mean absolute error of 0.8 on the MOS scale.

02

The dataset covers diverse TTS systems and human voices.

03

Validation shows the dataset's potential to improve Spanish TTS research.

Abstract

This work addresses the development of a database for the automatic assessment of text-to-speech (TTS) systems in Spanish, aiming to improve the accuracy of naturalness prediction models. The dataset consists of 4,326 audio samples from 52 different TTS systems and human voices and is, up to our knowledge, the first of its kind in Spanish. To label the audios, a subjective test was designed based on the ITU-T Rec. P.807 standard and completed by 92 participants. Furthermore, the utility of the collected dataset was validated by training automatic naturalness prediction systems. We explored two approaches: fine-tuning an existing model originally trained for English, and training small downstream networks on top of frozen self-supervised speech models. Our models achieve a mean absolute error of 0.8 on a five-point MOS scale. Further analysis demonstrates the quality and diversity of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

asosawelford/es-TTS-subjective-naturalness
dataset· 10 dl
10 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Phonetics and Phonology Research · Voice and Speech Disorders