HUI-Audio-Corpus-German: A high quality TTS dataset
Pascal Puchtler, Johannes Wirth, Ren\'e Peinl

TL;DR
This paper introduces HUI-Audio-Corpus-German, a high-quality, open-source dataset designed to improve German TTS systems by providing well-aligned audio and text data, addressing previous quality and resource limitations.
Contribution
The paper presents a new large-scale German TTS dataset with a processing pipeline that enhances audio quality and alignment, reducing manual effort in dataset creation.
Findings
High-quality audio-text alignments achieved
Reduces manual effort in dataset creation
Supports improved German TTS development
Abstract
The increasing availability of audio data on the internet lead to a multitude of datasets for development and training of text to speech applications, based on neural networks. Highly differing quality of voice, low sampling rates, lack of text normalization and disadvantageous alignment of audio samples to corresponding transcript sentences still limit the performance of deep neural networks trained on this task. Additionally, data resources in languages like German are still very limited. We introduce the "HUI-Audio-Corpus-German", a large, open-source dataset for TTS engines, created with a processing pipeline, which produces high quality audio to transcription alignments and decreases manual effort needed for creation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
