End-to-End Neural Systems for Automatic Children Speech Recognition: An   Empirical Study

Prashanth Gurunath Shivakumar; Shrikanth Narayanan

arXiv:2102.09918·eess.AS·February 22, 2021

End-to-End Neural Systems for Automatic Children Speech Recognition: An Empirical Study

Prashanth Gurunath Shivakumar, Shrikanth Narayanan

PDF

TL;DR

This paper empirically evaluates state-of-the-art end-to-end neural speech recognition systems for children, highlighting challenges and factors affecting performance due to children's speech variability and data limitations.

Contribution

It provides a comprehensive assessment of current end-to-end children speech recognition systems, analyzing data needs, adaptation methods, and the impact of various factors on accuracy.

Findings

01

Children speech recognition is more challenging due to variability.

02

Training data quality and quantity significantly affect performance.

03

Language models and architecture choices influence recognition accuracy.

Abstract

A key desiderata for inclusive and accessible speech recognition technology is ensuring its robust performance to children's speech. Notably, this includes the rapidly advancing neural network based end-to-end speech recognition systems. Children speech recognition is more challenging due to the larger intra-inter speaker variability in terms of acoustic and linguistic characteristics compared to adult speech. Furthermore, the lack of adequate and appropriate children speech resources adds to the challenge of designing robust end-to-end neural architectures. This study provides a critical assessment of automatic children speech recognition through an empirical study of contemporary state-of-the-art end-to-end speech recognition systems. Insights are provided on the aspects of training data requirements, adaptation on children data, and the effect of children age, utterance lengths,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.