ChildMandarin: A Comprehensive Mandarin Speech Dataset for Young   Children Aged 3-5

Jiaming Zhou; Shiyao Wang; Shiwan Zhao; Jiabei He; Haoqin Sun; Hui; Wang; Cheng Liu; Aobo Kong; Yujie Guo; Xi Yang; Yequan Wang; Yonghua Lin and; Yong Qin

arXiv:2409.18584·cs.SD·March 20, 2025

ChildMandarin: A Comprehensive Mandarin Speech Dataset for Young Children Aged 3-5

Jiaming Zhou, Shiyao Wang, Shiwan Zhao, Jiabei He, Haoqin Sun, Hui, Wang, Cheng Liu, Aobo Kong, Yujie Guo, Xi Yang, Yequan Wang, Yonghua Lin and, Yong Qin

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This paper introduces ChildMandarin, a comprehensive Mandarin speech dataset for children aged 3-5, enabling improved ASR and speaker verification research for young children's speech in Mandarin.

Contribution

The paper presents a new, large-scale Mandarin speech dataset for young children, with detailed analysis and evaluation of ASR and SV models, addressing a critical resource gap.

Findings

01

Fine-tuning pre-trained models significantly improves ASR performance.

02

The dataset supports effective speaker verification despite children's vocal variability.

03

ASR models trained from scratch show promising results on child speech.

Abstract

Automatic speech recognition (ASR) systems have advanced significantly with models like Whisper, Conformer, and self-supervised frameworks such as Wav2vec 2.0 and HuBERT. However, developing robust ASR models for young children's speech remains challenging due to differences in pronunciation, tone, and pace compared to adult speech. In this paper, we introduce a new Mandarin speech dataset focused on children aged 3 to 5, addressing the scarcity of resources in this area. The dataset comprises 41.25 hours of speech with carefully crafted manual transcriptions, collected from 397 speakers across various provinces in China, with balanced gender representation. We provide a comprehensive analysis of speaker demographics, speech duration distribution and geographic coverage. Additionally, we evaluate ASR performance on models trained from scratch, such as Conformer, as well as fine-tuned…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

flageval-baai/childmandarin
noneOfficial

Datasets

BAAI/ChildMandarin
dataset· 81 dl
81 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis