Reducing language context confusion for end-to-end code-switching   automatic speech recognition

Shuai Zhang; Jiangyan Yi; Zhengkun Tian; Jianhua Tao; Yu Ting Yeung,; Liqun Deng

arXiv:2201.12155·cs.CL·June 30, 2022·1 cites

Reducing language context confusion for end-to-end code-switching automatic speech recognition

Shuai Zhang, Jiangyan Yi, Zhengkun Tian, Jianhua Tao, Yu Ting Yeung,, Liqun Deng

PDF

Open Access

TL;DR

This paper introduces a linguistically motivated attention mechanism for end-to-end code-switching speech recognition, effectively reducing multilingual confusion and improving accuracy by leveraging monolingual data based on the Equivalence Constraint theory.

Contribution

It proposes a novel language-related attention mechanism grounded in linguistic theory to enhance code-switching ASR performance by transferring knowledge from monolingual data.

Findings

01

Achieved 17.12% relative error reduction over baseline

02

Effectively transfers monolingual language knowledge

03

Reduces multilingual context confusion

Abstract

Code-switching deals with alternative languages in communication process. Training end-to-end (E2E) automatic speech recognition (ASR) systems for code-switching is especially challenging as code-switching training data are always insufficient to combat the increased multilingual context confusion due to the presence of more than one language. We propose a language-related attention mechanism to reduce multilingual context confusion for the E2E code-switching ASR model based on the Equivalence Constraint (EC) Theory. The linguistic theory requires that any monolingual fragment that occurs in the code-switching sentence must occur in one of the monolingual sentences. The theory establishes a bridge between monolingual data and code-switching data. We leverage this linguistics theory to design the code-switching E2E ASR model. The proposed model efficiently transfers language knowledge…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Text Readability and Simplification