Monolingual Recognizers Fusion for Code-switching Speech Recognition

Tongtong Song; Qiang Xu; Haoyu Lu; Longbiao Wang; Hao Shi; Yuqin Lin,; Yanbing Yang; Jianwu Dang

arXiv:2211.01046·eess.AS·November 3, 2022·1 cites

Monolingual Recognizers Fusion for Code-switching Speech Recognition

Tongtong Song, Qiang Xu, Haoyu Lu, Longbiao Wang, Hao Shi, Yuqin Lin,, Yanbing Yang, Jianwu Dang

PDF

Open Access

TL;DR

This paper introduces a novel monolingual recognizers fusion approach for code-switching speech recognition, utilizing a two-stage process to improve performance by effectively combining language-specific predictions.

Contribution

It proposes a two-stage fusion method with language-aware training and a text simulation strategy, enabling better use of pre-trained monolingual models for code-switching ASR.

Findings

01

Significant reduction in mix error rate on Mandarin-English corpus

02

Effective fusion of monolingual models improves code-switching recognition

03

Proposed method outperforms existing bi-encoder approaches

Abstract

The bi-encoder structure has been intensively investigated in code-switching (CS) automatic speech recognition (ASR). However, most existing methods require the structures of two monolingual ASR models (MAMs) should be the same and only use the encoder of MAMs. This leads to the problem that pre-trained MAMs cannot be timely and fully used for CS ASR. In this paper, we propose a monolingual recognizers fusion method for CS ASR. It has two stages: the speech awareness (SA) stage and the language fusion (LF) stage. In the SA stage, acoustic features are mapped to two language-specific predictions by two independent MAMs. To keep the MAMs focused on their own language, we further extend the language-aware training strategy for the MAMs. In the LF stage, the BELM fuses two language-specific predictions to get the final prediction. Moreover, we propose a text simulation strategy to simplify…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Phonetics and Phonology Research

MethodsTest