Spectral Modification Based Data Augmentation For Improving End-to-End ASR For Children's Speech
Vishwanath Pratap Singh, Hardik Sailor, Supratik Bhattacharya,, Abhishek Pandey

TL;DR
This paper introduces a novel spectral modification data augmentation technique that transforms adult speech spectra into children-like spectra, significantly improving end-to-end children's speech recognition accuracy.
Contribution
The paper proposes a new segmental spectrum warping and formant energy perturbation method for data augmentation to enhance children's speech recognition systems.
Findings
6.5% and 6.1% relative WER reduction on children dev and test sets.
3.7% and 5.1% relative WER reduction when combining children's data.
Effective spectral augmentation improves ASR performance for children's speech.
Abstract
Training a robust Automatic Speech Recognition (ASR) system for children's speech recognition is a challenging task due to inherent differences in acoustic attributes of adult and child speech and scarcity of publicly available children's speech dataset. In this paper, a novel segmental spectrum warping and perturbations in formant energy are introduced, to generate a children-like speech spectrum from that of an adult's speech spectrum. Then, this modified adult spectrum is used as augmented data to improve end-to-end ASR systems for children's speech recognition. The proposed data augmentation methods give 6.5% and 6.1% relative reduction in WER on children dev and test sets respectively, compared to the vocal tract length perturbation (VTLP) baseline system trained on Librispeech 100 hours adult speech dataset. When children's speech data is added in training with Librispeech set, it…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
