VoiceBank-2023: A Multi-Speaker Mandarin Speech Corpus for Constructing   Personalized TTS Systems for the Speech Impaired

Jia-Jyu Su; Pang-Chen Liao; Yen-Ting Lin; Wu-Hao Li; Guan-Ting Liou,; Cheng-Che Kao; Wei-Cheng Chen; Jen-Chieh Chiang; Wen-Yang Chang; Pin-Han Lin,; Chen-Yu Chiang

arXiv:2308.14763·eess.AS·August 30, 2023·1 cites

VoiceBank-2023: A Multi-Speaker Mandarin Speech Corpus for Constructing Personalized TTS Systems for the Speech Impaired

Jia-Jyu Su, Pang-Chen Liao, Yen-Ting Lin, Wu-Hao Li, Guan-Ting Liou,, Cheng-Che Kao, Wei-Cheng Chen, Jen-Chieh Chiang, Wen-Yang Chang, Pin-Han Lin,, Chen-Yu Chiang

PDF

Open Access 1 Repo

TL;DR

VoiceBank-2023 is a comprehensive Mandarin speech corpus designed to facilitate personalized TTS systems for speech-impaired individuals, especially ALS patients, with detailed annotations and evaluations.

Contribution

This paper introduces the VoiceBank-2023 corpus, including its design, recording process, data cleaning, and evaluation of personalized TTS systems for Mandarin speakers with impairments.

Findings

01

Corpus contains 29.78 hours of speech from 111 speakers.

02

Includes detailed annotations like gender, impairment level, and speaking rate.

03

Evaluations demonstrate the corpus's effectiveness for personalized TTS development.

Abstract

Services of personalized TTS systems for the Mandarin-speaking speech impaired are rarely mentioned. Taiwan started the VoiceBanking project in 2020, aiming to build a complete set of services to deliver personalized Mandarin TTS systems to amyotrophic lateral sclerosis patients. This paper reports the corpus design, corpus recording, data purging and correction for the corpus, and evaluations of the developed personalized TTS systems, for the VoiceBanking project. The developed corpus is named after the VoiceBank-2023 speech corpus because of its release year. The corpus contains 29.78 hours of utterances with prompts of short paragraphs and common phrases spoken by 111 native Mandarin speakers. The corpus is labeled with information about gender, degree of speech impairment, types of users, transcription, SNRs, and speaking rates. The VoiceBank-2023 is available by request for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

voicebank-ntpu-tw/voicebank-2023
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVoice and Speech Disorders · Dysphagia Assessment and Management