Findings of the 2024 Mandarin Stuttering Event Detection and Automatic   Speech Recognition Challenge

Hongfei Xue; Rong Gong; Mingchen Shao; Xin Xu; Lezhi Wang; Lei Xie,; Hui Bu; Jiaming Zhou; Yong Qin; Jun Du; Ming Li; Binbin Zhang; Bin Jia

arXiv:2409.05430·eess.AS·September 10, 2024

Findings of the 2024 Mandarin Stuttering Event Detection and Automatic Speech Recognition Challenge

Hongfei Xue, Rong Gong, Mingchen Shao, Xin Xu, Lezhi Wang, Lei Xie,, Hui Bu, Jiaming Zhou, Yong Qin, Jun Du, Ming Li, Binbin Zhang, Bin Jia

PDF

Open Access

TL;DR

This paper reports on the 2024 Mandarin Stuttering Event Detection and Automatic Speech Recognition Challenge, highlighting dataset details, challenge tracks, and performance analysis of top systems to advance speech tech for people who stutter.

Contribution

It introduces the AS-70 dataset, defines challenge tracks for stuttering detection and recognition, and analyzes system performances to promote specialized models for stuttered speech.

Findings

01

Improved detection accuracy with specialized models

02

Reduced recognition error rates through augmentation strategies

03

Demonstrated potential of tailored approaches for stuttered speech

Abstract

The StutteringSpeech Challenge focuses on advancing speech technologies for people who stutter, specifically targeting Stuttering Event Detection (SED) and Automatic Speech Recognition (ASR) in Mandarin. The challenge comprises three tracks: (1) SED, which aims to develop systems for detection of stuttering events; (2) ASR, which focuses on creating robust systems for recognizing stuttered speech; and (3) Research track for innovative approaches utilizing the provided dataset. We utilizes an open-source Mandarin stuttering dataset AS-70, which has been split into new training and test sets for the challenge. This paper presents the dataset, details the challenge tracks, and analyzes the performance of the top systems, highlighting improvements in detection accuracy and reductions in recognition error rates. Our findings underscore the potential of specialized models and augmentation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis