HUI-Audio-Corpus-German: A high quality TTS dataset

Pascal Puchtler; Johannes Wirth; Ren\'e Peinl

arXiv:2106.06309·cs.SD·June 14, 2021

HUI-Audio-Corpus-German: A high quality TTS dataset

Pascal Puchtler, Johannes Wirth, Ren\'e Peinl

PDF

1 Repo 1 Datasets

TL;DR

This paper introduces HUI-Audio-Corpus-German, a high-quality, open-source dataset designed to improve German TTS systems by providing well-aligned audio and text data, addressing previous quality and resource limitations.

Contribution

The paper presents a new large-scale German TTS dataset with a processing pipeline that enhances audio quality and alignment, reducing manual effort in dataset creation.

Findings

01

High-quality audio-text alignments achieved

02

Reduces manual effort in dataset creation

03

Supports improved German TTS development

Abstract

The increasing availability of audio data on the internet lead to a multitude of datasets for development and training of text to speech applications, based on neural networks. Highly differing quality of voice, low sampling rates, lack of text normalization and disadvantageous alignment of audio samples to corresponding transcript sentences still limit the performance of deep neural networks trained on this task. Additionally, data resources in languages like German are still very limited. We introduce the "HUI-Audio-Corpus-German", a large, open-source dataset for TTS engines, created with a processing pipeline, which produces high quality audio to transcription alignments and decreases manual effort needed for creation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

iisys-hof/HUI-Audio-Corpus-German
noneOfficial

Datasets

Paradoxia/opendata-iisys-hui
dataset· 69 dl
69 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.