Language Adaptive Cross-lingual Speech Representation Learning with   Sparse Sharing Sub-networks

Yizhou Lu; Mingkun Huang; Xinghua Qu; Pengfei Wei; Zejun Ma

arXiv:2203.04583·eess.AS·March 10, 2022

Language Adaptive Cross-lingual Speech Representation Learning with Sparse Sharing Sub-networks

Yizhou Lu, Mingkun Huang, Xinghua Qu, Pengfei Wei, Zejun Ma

PDF

Open Access

TL;DR

This paper introduces a novel language adaptive pre-training method for cross-lingual speech models using sparse sharing sub-networks, significantly improving multilingual speech recognition performance across resource levels.

Contribution

It proposes a sparse sharing sub-network approach for language adaptive training in XLSR models, reducing interference and enhancing performance without manual language-specific components.

Findings

01

Outperforms baseline XLSR models on multilingual speech recognition

02

Requires fewer parameters than existing adaptation methods

03

Effective for both high-resource and low-resource languages

Abstract

Unsupervised cross-lingual speech representation learning (XLSR) has recently shown promising results in speech recognition by leveraging vast amounts of unlabeled data across multiple languages. However, standard XLSR model suffers from language interference problem due to the lack of language specific modeling ability. In this work, we investigate language adaptive training on XLSR models. More importantly, we propose a novel language adaptive pre-training approach based on sparse sharing sub-networks. It makes room for language specific modeling by pruning out unimportant parameters for each language, without requiring any manually designed language specific component. After pruning, each language only maintains a sparse sub-network, while the sub-networks are partially shared with each other. Experimental results on a downstream multilingual speech recognition task show that our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Natural Language Processing Techniques

MethodsPruning · XLSR