Children's Speech Recognition through Discrete Token Enhancement
Vrunda N. Sukhadia, Shammur Absar Chowdhury

TL;DR
This paper explores the use of discrete speech tokens to improve children's speech recognition systems, addressing data scarcity and privacy issues while maintaining performance and reducing model complexity.
Contribution
It introduces a method for integrating discrete tokens into children's ASR systems and evaluates strategies for creating these labels and their generalization capabilities.
Findings
Discrete token ASR achieves nearly equivalent performance to traditional models.
Model complexity is reduced by approximately 83%.
Effective generalization to unseen domains and nativity datasets.
Abstract
Children's speech recognition is considered a low-resource task mainly due to the lack of publicly available data. There are several reasons for such data scarcity, including expensive data collection and annotation processes, and data privacy, among others. Transforming speech signals into discrete tokens that do not carry sensitive information but capture both linguistic and acoustic information could be a solution for privacy concerns. In this study, we investigate the integration of discrete speech tokens into children's speech recognition systems as input without significantly degrading the ASR performance. Additionally, we explored single-view and multi-view strategies for creating these discrete labels. Furthermore, we tested the models for generalization capabilities with unseen domain and nativity dataset. Results reveal that the discrete token ASR for children achieves nearly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing
