LAE: Language-Aware Encoder for Monolingual and Multilingual ASR

Jinchuan Tian; Jianwei Yu; Chunlei Zhang; Chao Weng; Yuexian Zou; Dong; Yu

arXiv:2206.02093·cs.CL·June 7, 2022

LAE: Language-Aware Encoder for Monolingual and Multilingual ASR

Jinchuan Tian, Jianwei Yu, Chunlei Zhang, Chao Weng, Yuexian Zou, Dong, Yu

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel language-aware encoder (LAE) architecture that effectively handles both monolingual and multilingual speech recognition by disentangling language-specific information, leading to improved performance on diverse ASR tasks.

Contribution

The paper proposes a new LAE architecture with language-specific blocks and a training method, enabling unified recognition of monolingual and multilingual speech in ASR systems.

Findings

01

LAE discriminates different languages at frame-level

02

LAE achieves significant improvements on code-switched ASR tasks

03

LAE outperforms previous models on both CTC and neural transducer systems

Abstract

Despite the rapid progress in automatic speech recognition (ASR) research, recognizing multilingual speech using a unified ASR system remains highly challenging. Previous works on multilingual speech recognition mainly focus on two directions: recognizing multiple monolingual speech or recognizing code-switched speech that uses different languages interchangeably within a single utterance. However, a pragmatic multilingual recognizer is expected to be compatible with both directions. In this work, a novel language-aware encoder (LAE) architecture is proposed to handle both situations by disentangling language-specific information and generating frame-level language-aware representations during encoding. In the LAE, the primary encoding is implemented by the shared block while the language-specific blocks are used to extract specific representations for each language. To learn…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jctian98/e2e_lfmmi
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Speech and dialogue systems