Dynamic Acoustic Unit Augmentation With BPE-Dropout for Low-Resource   End-to-End Speech Recognition

Aleksandr Laptev; Andrei Andrusenko; Ivan Podluzhny; Anton Mitrofanov,; Ivan Medennikov; Yuri Matveev

arXiv:2103.07186·eess.AS·June 8, 2021

Dynamic Acoustic Unit Augmentation With BPE-Dropout for Low-Resource End-to-End Speech Recognition

Aleksandr Laptev, Andrei Andrusenko, Ivan Podluzhny, Anton Mitrofanov,, Ivan Medennikov, Yuri Matveev

PDF

TL;DR

This paper introduces a dynamic acoustic unit augmentation method using BPE-dropout to improve low-resource end-to-end speech recognition, especially for handling OOV words, with significant accuracy gains.

Contribution

The proposed BPE-dropout based augmentation technique enhances recognition of unseen words and reduces vocabulary search complexity in low-resource ASR systems.

Findings

01

At least 6% relative WER improvement

02

25% relative F-score increase

03

Achieved 22.2% CER and 38.9% WER on Turkish data

Abstract

With the rapid development of speech assistants, adapting server-intended automatic speech recognition (ASR) solutions to a direct device has become crucial. Researchers and industry prefer to use end-to-end ASR systems for on-device speech recognition tasks. This is because end-to-end systems can be made resource-efficient while maintaining a higher quality compared to hybrid systems. However, building end-to-end models requires a significant amount of speech data. Another challenging task associated with speech assistants is personalization, which mainly lies in handling out-of-vocabulary (OOV) words. In this work, we consider building an effective end-to-end ASR system in low-resource setups with a high OOV rate, embodied in Babel Turkish and Babel Georgian tasks. To address the aforementioned problems, we propose a method of dynamic acoustic unit augmentation based on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.