A Parameter-efficient Language Extension Framework for Multilingual ASR

Wei Liu; Jingyong Hou; Dong Yang; Muyong Cao; Tan Lee

arXiv:2406.06329·cs.CL·June 11, 2024

A Parameter-efficient Language Extension Framework for Multilingual ASR

Wei Liu, Jingyong Hou, Dong Yang, Muyong Cao, Tan Lee

PDF

Open Access

TL;DR

This paper introduces PELE, a parameter-efficient framework for extending multilingual speech recognition models to new languages, effectively addressing catastrophic forgetting and demonstrating superior performance with minimal additional parameters.

Contribution

The paper proposes a novel architecture-based framework, PELE, that enables efficient language extension in MASR models using parameter-efficient fine-tuning modules, outperforming traditional methods.

Findings

01

PELE effectively incorporates new languages with minimal performance loss.

02

Adapter-based PEFT modules outperform weight or input feature-based methods.

03

Experiments on 5 languages show superior results over joint learning approaches.

Abstract

Covering all languages with a multilingual speech recognition model (MASR) is very difficult. Performing language extension on top of an existing MASR is a desirable choice. In this study, the MASR continual learning problem is probabilistically decomposed into language identity prediction (LP) and cross-lingual adaptation (XLA) sub-problems. Based on this, we propose an architecture-based framework for language extension that can fundamentally solve catastrophic forgetting, debudded as PELE. PELE is designed to be parameter-efficient, incrementally incorporating an add-on module to adapt to a new language. Specifically, different parameter-efficient fine-tuning (PEFT) modules and their variants are explored as potential candidates to perform XLA. Experiments are carried out on 5 new languages with a wide range of low-resourced data sizes. The best-performing PEFT candidate can achieve…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Speech Recognition and Synthesis · Service-Oriented Architecture and Web Services

MethodsAdapter