Dynamic Acoustic Unit Augmentation With BPE-Dropout for Low-Resource End-to-End Speech Recognition
Aleksandr Laptev, Andrei Andrusenko, Ivan Podluzhny, Anton Mitrofanov,, Ivan Medennikov, Yuri Matveev

TL;DR
This paper introduces a dynamic acoustic unit augmentation method using BPE-dropout to improve low-resource end-to-end speech recognition, especially for handling OOV words, with significant accuracy gains.
Contribution
The proposed BPE-dropout based augmentation technique enhances recognition of unseen words and reduces vocabulary search complexity in low-resource ASR systems.
Findings
At least 6% relative WER improvement
25% relative F-score increase
Achieved 22.2% CER and 38.9% WER on Turkish data
Abstract
With the rapid development of speech assistants, adapting server-intended automatic speech recognition (ASR) solutions to a direct device has become crucial. Researchers and industry prefer to use end-to-end ASR systems for on-device speech recognition tasks. This is because end-to-end systems can be made resource-efficient while maintaining a higher quality compared to hybrid systems. However, building end-to-end models requires a significant amount of speech data. Another challenging task associated with speech assistants is personalization, which mainly lies in handling out-of-vocabulary (OOV) words. In this work, we consider building an effective end-to-end ASR system in low-resource setups with a high OOV rate, embodied in Babel Turkish and Babel Georgian tasks. To address the aforementioned problems, we propose a method of dynamic acoustic unit augmentation based on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
