# CSS10: A Collection of Single Speaker Speech Datasets for 10 Languages

**Authors:** Kyubyong Park, Thomas Mulc

arXiv: 1903.11269 · 2019-08-06

## TL;DR

CSS10 is a new multilingual dataset collection of single speaker speech data for ten languages, enabling improved speech synthesis research and development.

## Contribution

It introduces a comprehensive multilingual speech dataset collection with aligned text and audio, validated through neural TTS models and MOS testing.

## Key findings

- High-quality speech datasets for 10 languages.
- Neural TTS models trained on these datasets produce intelligible speech.
- Public availability of datasets and models for future research.

## Abstract

We describe our development of CSS10, a collection of single speaker speech datasets for ten languages. It is composed of short audio clips from LibriVox audiobooks and their aligned texts. To validate its quality we train two neural text-to-speech models on each dataset. Subsequently, we conduct Mean Opinion Score tests on the synthesized speech samples. We make our datasets, pre-trained models, and test resources publicly available. We hope they will be used for future speech tasks.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1903.11269/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/1903.11269/full.md

## References

34 references — full list in the complete paper: https://tomesphere.com/paper/1903.11269/full.md

---
Source: https://tomesphere.com/paper/1903.11269