Adaptive multilingual speech recognition with pretrained models

Ngoc-Quan Pham; Alex Waibel; Jan Niehues

arXiv:2205.12304·cs.CL·May 26, 2022

Adaptive multilingual speech recognition with pretrained models

Ngoc-Quan Pham, Alex Waibel, Jan Niehues

PDF

Open Access

TL;DR

This paper explores leveraging pretrained audio and text models with adaptive weighting to enhance multilingual speech recognition, achieving significant improvements especially in low-resource languages.

Contribution

It introduces a novel approach combining wav2vec 2.0 and MBART50 with adaptive weights, demonstrating substantial performance gains over traditional supervised methods.

Findings

01

44% improvement over supervised learning

02

Different techniques reinforce different languages

03

Potential for further enhancement by architectural modifications

Abstract

Multilingual speech recognition with supervised learning has achieved great results as reflected in recent research. With the development of pretraining methods on audio and text data, it is imperative to transfer the knowledge from unsupervised multilingual models to facilitate recognition, especially in many languages with limited data. Our work investigated the effectiveness of using two pretrained models for two modalities: wav2vec 2.0 for audio and MBART50 for text, together with the adaptive weight techniques to massively improve the recognition quality on the public datasets containing CommonVoice and Europarl. Overall, we noticed an 44% improvement over purely supervised learning, and more importantly, each technique provides a different reinforcement in different languages. We also explore other possibilities to potentially obtain the best model by slightly adding either depth…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing