TALCS: An Open-Source Mandarin-English Code-Switching Corpus and a   Speech Recognition Baseline

Chengfei Li; Shuhao Deng; Yaoping Wang; Guangjing Wang; Yaguang Gong,; Changbin Chen; Jinfeng Bai

arXiv:2206.13135·cs.CL·June 28, 2022

TALCS: An Open-Source Mandarin-English Code-Switching Corpus and a Speech Recognition Baseline

Chengfei Li, Shuhao Deng, Yaoping Wang, Guangjing Wang, Yaguang Gong,, Changbin Chen, Jinfeng Bai

PDF

Open Access

TL;DR

This paper presents TALCS, the largest open-source Mandarin-English code-switching speech corpus, and establishes baseline speech recognition performance using ESPnet and Wenet toolkits, demonstrating its potential for advancing code-switching ASR research.

Contribution

Introduction of TALCS, the largest open-source Mandarin-English code-switching speech corpus, along with baseline ASR systems using popular toolkits for future research.

Findings

01

TALCS contains approximately 587 hours of speech data.

02

Baseline ASR systems achieved promising error rates.

03

The corpus is freely available for research and development.

Abstract

This paper introduces a new corpus of Mandarin-English code-switching speech recognition--TALCS corpus, suitable for training and evaluating code-switching speech recognition systems. TALCS corpus is derived from real online one-to-one English teaching scenes in TAL education group, which contains roughly 587 hours of speech sampled at 16 kHz. To our best knowledge, TALCS corpus is the largest well labeled Mandarin-English code-switching open source automatic speech recognition (ASR) dataset in the world. In this paper, we will introduce the recording procedure in detail, including audio capturing devices and corpus environments. And the TALCS corpus is freely available for download under the permissive license1. Using TALCS corpus, we conduct ASR experiments in two popular speech recognition toolkits to make a baseline system, including ESPnet and Wenet. The Mixture Error Rate (MER)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing

MethodsPointwise Convolution · Dilated Convolution · Hierarchical Feature Fusion · Efficient Spatial Pyramid · Convolution · Parameterized ReLU · Kaiming Initialization · 1x1 Convolution · ESPNet