GLOBE: A High-quality English Corpus with Global Accents for Zero-shot   Speaker Adaptive Text-to-Speech

Wenbin Wang; Yang Song; Sanjay Jha

arXiv:2406.14875·cs.SD·June 24, 2024

GLOBE: A High-quality English Corpus with Global Accents for Zero-shot Speaker Adaptive Text-to-Speech

Wenbin Wang, Yang Song, Sanjay Jha

PDF

Open Access 1 Models 5 Datasets

TL;DR

GLOBE is a comprehensive high-quality English speech corpus with diverse global accents, designed to improve zero-shot speaker adaptive TTS systems by enhancing accent diversity and metadata richness.

Contribution

The paper introduces GLOBE, a large-scale, high-quality English corpus with 23,519 speakers and 164 accents, specifically tailored for advancing zero-shot speaker adaptive TTS.

Findings

01

TTS models trained on GLOBE outperform those trained on other corpora in speaker similarity.

02

GLOBE improves naturalness and accent diversity in synthesized speech.

03

The dataset enhances zero-shot speaker adaptation capabilities.

Abstract

This paper introduces GLOBE, a high-quality English corpus with worldwide accents, specifically designed to address the limitations of current zero-shot speaker adaptive Text-to-Speech (TTS) systems that exhibit poor generalizability in adapting to speakers with accents. Compared to commonly used English corpora, such as LibriTTS and VCTK, GLOBE is unique in its inclusion of utterances from 23,519 speakers and covers 164 accents worldwide, along with detailed metadata for these speakers. Compared to its original corpus, i.e., Common Voice, GLOBE significantly improves the quality of the speech data through rigorous filtering and enhancement processes, while also populating all missing speaker metadata. The final curated GLOBE corpus includes 535 hours of speech data at a 24 kHz sampling rate. Our benchmark results indicate that the speaker adaptive TTS model trained on the GLOBE corpus…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
benjamin-paine/hey-buddy
model· ♡ 10
♡ 10

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Speech and dialogue systems