Improved Self-Supervised Multilingual Speech Representation Learning   Combined with Auxiliary Language Information

Fenglin Ding; Genshun Wan; Pengcheng Li; Jia Pan; Cong Liu

arXiv:2212.03476·eess.AS·December 8, 2022

Improved Self-Supervised Multilingual Speech Representation Learning Combined with Auxiliary Language Information

Fenglin Ding, Genshun Wan, Pengcheng Li, Jia Pan, Cong Liu

PDF

Open Access

TL;DR

This paper enhances self-supervised multilingual speech representation learning by integrating auxiliary language information, leading to significant performance improvements in multilingual ASR tasks.

Contribution

It introduces novel techniques like language adversarial training, language embedding, and language adaptive training for better multilingual pre-training.

Findings

01

Achieved 14.3% relative gain over XLSR

02

Achieved 19.8% relative gain over no pre-training

03

Demonstrated effectiveness on 16-language ASR task

Abstract

Multilingual end-to-end models have shown great improvement over monolingual systems. With the development of pre-training methods on speech, self-supervised multilingual speech representation learning like XLSR has shown success in improving the performance of multilingual automatic speech recognition (ASR). However, similar to the supervised learning, multilingual pre-training may also suffer from language interference and further affect the application of multilingual system. In this paper, we introduce several techniques for improving self-supervised multilingual pre-training by leveraging auxiliary language information, including the language adversarial training, language embedding and language adaptive training during the pre-training stage. We conduct experiments on a multilingual ASR task consisting of 16 languages. Our experimental results demonstrate 14.3% relative gain over…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and Audio Processing

MethodsXLSR