LAE: Language-Aware Encoder for Monolingual and Multilingual ASR
Jinchuan Tian, Jianwei Yu, Chunlei Zhang, Chao Weng, Yuexian Zou, Dong, Yu

TL;DR
This paper introduces a novel language-aware encoder (LAE) architecture that effectively handles both monolingual and multilingual speech recognition by disentangling language-specific information, leading to improved performance on diverse ASR tasks.
Contribution
The paper proposes a new LAE architecture with language-specific blocks and a training method, enabling unified recognition of monolingual and multilingual speech in ASR systems.
Findings
LAE discriminates different languages at frame-level
LAE achieves significant improvements on code-switched ASR tasks
LAE outperforms previous models on both CTC and neural transducer systems
Abstract
Despite the rapid progress in automatic speech recognition (ASR) research, recognizing multilingual speech using a unified ASR system remains highly challenging. Previous works on multilingual speech recognition mainly focus on two directions: recognizing multiple monolingual speech or recognizing code-switched speech that uses different languages interchangeably within a single utterance. However, a pragmatic multilingual recognizer is expected to be compatible with both directions. In this work, a novel language-aware encoder (LAE) architecture is proposed to handle both situations by disentangling language-specific information and generating frame-level language-aware representations during encoding. In the LAE, the primary encoding is implemented by the shared block while the language-specific blocks are used to extract specific representations for each language. To learn…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Speech and dialogue systems
