Towards End-to-End Code-Switching Speech Recognition

Ne Luo; Dongwei Jiang; Shuaijiang Zhao; Caixia Gong; Wei Zou; Xiangang; Li

arXiv:1810.13091·cs.CL·November 2, 2018·43 cites

Towards End-to-End Code-Switching Speech Recognition

Ne Luo, Dongwei Jiang, Shuaijiang Zhao, Caixia Gong, Wei Zou, Xiangang, Li

PDF

Open Access

TL;DR

This paper develops an end-to-end Mandarin-English code-switching speech recognition system using a hybrid CTC-Attention model, eliminating the need for linguistic expertise and exploring various modeling strategies, achieving a 34.24% MER on SEAME.

Contribution

It introduces a hybrid CTC-Attention end-to-end model for code-switching ASR and studies the impact of modeling units, language ID, and decoding strategies.

Findings

01

Hybrid CTC-Attention model improves recognition accuracy.

02

Including language identification enhances system performance.

03

Achieved a 34.24% mixed error rate on SEAME corpus.

Abstract

Code-switching speech recognition has attracted an increasing interest recently, but the need for expert linguistic knowledge has always been a big issue. End-to-end automatic speech recognition (ASR) simplifies the building of ASR systems considerably by predicting graphemes or characters directly from acoustic input. In the mean time, the need of expert linguistic knowledge is also eliminated, which makes it an attractive choice for code-switching ASR. This paper presents a hybrid CTC-Attention based end-to-end Mandarin-English code-switching (CS) speech recognition system and studies the effect of hybrid CTC-Attention based models, different modeling units, the inclusion of language identification and different decoding strategies on the task of code-switching ASR. On the SEAME corpus, our system achieves a mixed error rate (MER) of 34.24%.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and dialogue systems