Language-specific Characteristic Assistance for Code-switching Speech   Recognition

Tongtong Song; Qiang Xu; Meng Ge; Longbiao Wang; Hao Shi; Yongjie Lv,; Yuqin Lin; Jianwu Dang

arXiv:2206.14580·cs.CL·July 13, 2022

Language-specific Characteristic Assistance for Code-switching Speech Recognition

Tongtong Song, Qiang Xu, Meng Ge, Longbiao Wang, Hao Shi, Yongjie Lv,, Yuqin Lin, Jianwu Dang

PDF

Open Access

TL;DR

This paper introduces a language-specific characteristic assistance (LSCA) method for code-switching speech recognition, leveraging language constraints and pre-trained models to improve accuracy without extra shared parameters.

Contribution

The paper proposes LSCA, a novel approach that incorporates language constraints during training and decoding, enhancing code-switching speech recognition performance using pre-trained language-specific models.

Findings

01

Up to 15.4% relative error reduction on code-switching test set.

02

Improved performance with training and decoding methods of LSCA.

03

Effective recognition without extra shared parameters or retraining.

Abstract

Dual-encoder structure successfully utilizes two language-specific encoders (LSEs) for code-switching speech recognition. Because LSEs are initialized by two pre-trained language-specific models (LSMs), the dual-encoder structure can exploit sufficient monolingual data and capture the individual language attributes. However, most existing methods have no language constraints on LSEs and underutilize language-specific knowledge of LSMs. In this paper, we propose a language-specific characteristic assistance (LSCA) method to mitigate the above problems. Specifically, during training, we introduce two language-specific losses as language constraints and generate corresponding language-specific targets for them. During decoding, we take the decoding abilities of LSMs into account by combining the output probabilities of two LSMs and the mixture model to obtain the final predictions.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Natural Language Processing Techniques

MethodsTest