Language-specific Acoustic Boundary Learning for Mandarin-English   Code-switching Speech Recognition

Zhiyun Fan; Linhao Dong; Chen Shen; Zhenlin Liang; Jun Zhang; Lu Lu,; Zejun Ma

arXiv:2306.05279·cs.SD·June 9, 2023·2 cites

Language-specific Acoustic Boundary Learning for Mandarin-English Code-switching Speech Recognition

Zhiyun Fan, Linhao Dong, Chen Shen, Zhenlin Liang, Jun Zhang, Lu Lu,, Zejun Ma

PDF

Open Access

TL;DR

This paper introduces a novel language-specific acoustic boundary learning approach for Mandarin-English code-switching speech recognition, employing language-specific weight estimators, a non-autoregressive decoder, and a change detection module to improve accuracy.

Contribution

It proposes a new method combining language-specific boundary modeling, a non-autoregressive decoder, and change detection for improved code-switching speech recognition.

Findings

01

Achieved state-of-the-art MER of 16.29% on SEAME corpus.

02

Reduced MER by 7.9% on a large in-house dataset.

03

Demonstrated effectiveness across multiple datasets.

Abstract

Code-switching speech recognition (CSSR) transcribes speech that switches between multiple languages or dialects within a single sentence. The main challenge in this task is that different languages often have similar pronunciations, making it difficult for models to distinguish between them. In this paper, we propose a method for solving the CSSR task from the perspective of language-specific acoustic boundary learning. We introduce language-specific weight estimators (LSWE) to model acoustic boundary learning in different languages separately. Additionally, a non-autoregressive (NAR) decoder and a language change detection (LCD) module are employed to assist in training. Evaluated on the SEAME corpus, our method achieves a state-of-the-art mixed error rate (MER) of 16.29% and 22.81% on the test_man and test_sge sets. We also demonstrate the effectiveness of our method on a 9000-hour…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Phonetics and Phonology Research