Master-ASR: Achieving Multilingual Scalability and Low-Resource   Adaptation in ASR with Modular Learning

Zhongzhi Yu; Yang Zhang; Kaizhi Qian; Yonggan Fu; Yingyan Lin

arXiv:2306.15686·eess.AS·June 29, 2023·1 cites

Master-ASR: Achieving Multilingual Scalability and Low-Resource Adaptation in ASR with Modular Learning

Zhongzhi Yu, Yang Zhang, Kaizhi Qian, Yonggan Fu, Yingyan Lin

PDF

Open Access 1 Video

TL;DR

Master-ASR introduces a modular learning framework that enhances multilingual scalability and low-resource adaptation in automatic speech recognition by sharing and assembling language-specific modules, outperforming state-of-the-art methods.

Contribution

It proposes a novel modular ASR framework that simultaneously improves multilingual scalability and low-resource adaptation through a learnable, assemble-then-share strategy.

Findings

01

Achieves 0.13-2.41 lower CER on multilingual ASR with 30% less inference overhead.

02

Performs nearly 50 times fewer trainable parameters in low-resource tuning.

03

Effectively discovers language similarities and enhances performance over SOTA methods.

Abstract

Despite the impressive performance recently achieved by automatic speech recognition (ASR), we observe two primary challenges that hinder its broader applications: (1) The difficulty of introducing scalability into the model to support more languages with limited training, inference, and storage overhead; (2) The low-resource adaptation ability that enables effective low-resource adaptation while avoiding over-fitting and catastrophic forgetting issues. Inspired by recent findings, we hypothesize that we can address the above challenges with modules widely shared across languages. To this end, we propose an ASR framework, dubbed \METHODNS, that, \textit{for the first time}, simultaneously achieves strong multilingual scalability and low-resource adaptation ability thanks to its modularize-then-assemble strategy. Specifically, \METHOD learns a small set of generalizable sub-modules and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Master-ASR: Achieving Multilingual Scalability and Low-Resource Adaptation in ASR with Modular Learning· slideslive

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and Audio Processing