The ASRU 2019 Mandarin-English Code-Switching Speech Recognition   Challenge: Open Datasets, Tracks, Methods and Results

Xian Shi; Qiangze Feng; Lei Xie

arXiv:2007.05916·eess.AS·July 14, 2020·20 cites

The ASRU 2019 Mandarin-English Code-Switching Speech Recognition Challenge: Open Datasets, Tracks, Methods and Results

Xian Shi, Qiangze Feng, Lei Xie

PDF

Open Access

TL;DR

This paper reports on the ASRU 2019 Mandarin-English code-switching speech recognition challenge, providing datasets, tracks, methods, and results to advance recognition performance in a complex multilingual setting.

Contribution

It introduces a standardized benchmark with datasets and tracks for Mandarin-English code-switching speech recognition, and analyzes various modeling approaches and their effectiveness.

Findings

01

Traditional systems benefit from lexicon, data augmentation, and CS text generation.

02

E2E models improve with language ID, modeling units, and SpecAugment.

03

Results highlight key techniques for improving CS speech recognition.

Abstract

Code-switching (CS) is a common phenomenon and recognizing CS speech is challenging. But CS speech data is scarce and there' s no common testbed in relevant research. This paper describes the design and main outcomes of the ASRU 2019 Mandarin-English code-switching speech recognition challenge, which aims to improve the ASR performance in Mandarin-English code-switching situation. 500 hours Mandarin speech data and 240 hours Mandarin-English intra-sentencial CS data are released to the participants. Three tracks were set for advancing the AM and LM part in traditional DNN-HMM ASR system, as well as exploring the E2E models' performance. The paper then presents an overview of the results and system performance in the three tracks. It turns out that traditional ASR system benefits from pronunciation lexicon, CS text generating and data augmentation. In E2E track, however, the results…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Phonetics and Phonology Research

MethodsAttention Model