Sequence-based Multi-lingual Low Resource Speech Recognition
Siddharth Dalmia, Ramon Sanabria, Florian Metze, Alan W. Black

TL;DR
This paper demonstrates that end-to-end sequence models trained with CTC loss can effectively improve low-resource multilingual speech recognition and adapt to new languages with limited data.
Contribution
It shows the effectiveness of end-to-end multi-lingual training of sequence models for low-resource speech recognition and cross-lingual adaptation.
Findings
Over 6% absolute error rate reduction on Babel languages
Effective cross-lingual adaptation with 25% target data
Training on multiple languages benefits very low resource scenarios
Abstract
Techniques for multi-lingual and cross-lingual speech recognition can help in low resource scenarios, to bootstrap systems and enable analysis of new languages and domains. End-to-end approaches, in particular sequence-based techniques, are attractive because of their simplicity and elegance. While it is possible to integrate traditional multi-lingual bottleneck feature extractors as front-ends, we show that end-to-end multi-lingual training of sequence models is effective on context independent models trained using Connectionist Temporal Classification (CTC) loss. We show that our model improves performance on Babel languages by over 6% absolute in terms of word/phoneme error rate when compared to mono-lingual systems built in the same setting for these languages. We also show that the trained model can be adapted cross-lingually to an unseen language using just 25% of the target data.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
