AS-70: A Mandarin stuttered speech dataset for automatic speech   recognition and stuttering event detection

Rong Gong; Hongfei Xue; Lezhi Wang; Xin Xu; Qisheng Li; Lei Xie; Hui; Bu; Shaomei Wu; Jiaming Zhou; Yong Qin; Binbin Zhang; Jun Du; Jia Bin; Ming; Li

arXiv:2406.07256·cs.SD·June 12, 2024

AS-70: A Mandarin stuttered speech dataset for automatic speech recognition and stuttering event detection

Rong Gong, Hongfei Xue, Lezhi Wang, Xin Xu, Qisheng Li, Lei Xie, Hui, Bu, Shaomei Wu, Jiaming Zhou, Yong Qin, Binbin Zhang, Jun Du, Jia Bin, Ming, Li

PDF

Open Access 1 Datasets

TL;DR

This paper introduces AS-70, the first large publicly available Mandarin stuttered speech dataset, and demonstrates its usefulness in improving speech recognition and stuttering event detection models.

Contribution

It provides the first large Mandarin stuttered speech dataset and baseline systems, enabling better ASR and stuttering detection for atypical speech.

Findings

01

Significant improvements in ASR accuracy with the dataset

02

Enhanced stuttering event detection performance

03

Increased inclusivity of speech models for atypical speech

Abstract

The rapid advancements in speech technologies over the past two decades have led to human-level performance in tasks like automatic speech recognition (ASR) for fluent speech. However, the efficacy of these models diminishes when applied to atypical speech, such as stuttering. This paper introduces AS-70, the first publicly available Mandarin stuttered speech dataset, which stands out as the largest dataset in its category. Encompassing conversational and voice command reading speech, AS-70 includes verbatim manual transcription, rendering it suitable for various speech-related tasks. Furthermore, baseline systems are established, and experimental results are presented for ASR and stuttering event detection (SED) tasks. By incorporating this dataset into the model fine-tuning, significant improvements in the state-of-the-art ASR models, e.g., Whisper and Hubert, are observed, enhancing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

AImpower/MandarinStutteredSpeech
dataset· 3 dl
3 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Phonetics and Phonology Research · Stuttering Research and Treatment