Adapting the adapters for code-switching in multilingual ASR

Atharva Kulkarni; Ajinkya Kulkarni; Miguel Couceiro; Hanan Aldarmaki

arXiv:2310.07423·cs.CL·October 12, 2023·2 cites

Adapting the adapters for code-switching in multilingual ASR

Atharva Kulkarni, Ajinkya Kulkarni, Miguel Couceiro, Hanan Aldarmaki

PDF

Open Access 1 Repo

TL;DR

This paper introduces methods to adapt multilingual speech models for code-switching scenarios, enabling better recognition of mixed-language utterances by integrating language adapters and latent binary sequences, resulting in significant CER improvements.

Contribution

It proposes novel fine-tuning techniques for language adapters in multilingual ASR models to handle code-switching effectively, which was previously challenging.

Findings

01

Achieved at least 10% absolute reduction in CER across datasets

02

Demonstrated effective integration of language adapters for code-switching

03

Improved performance on Arabic, Mandarin, and Hindi-English datasets

Abstract

Recently, large pre-trained multilingual speech models have shown potential in scaling Automatic Speech Recognition (ASR) to many low-resource languages. Some of these models employ language adapters in their formulation, which helps to improve monolingual performance and avoids some of the drawbacks of multi-lingual modeling on resource-rich languages. However, this formulation restricts the usability of these models on code-switched speech, where two languages are mixed together in the same utterance. In this work, we propose ways to effectively fine-tune such models on code-switched speech, by assimilating information from both language adapters at each language adaptation point in the network. We also model code-switching as a sequence of latent binary sequences that can be used to guide the flow of information from each language adapter at the frame level. The proposed approaches…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

atharva7k/mms-code-switching
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Speech and dialogue systems

MethodsAdapter