DOTA-ME-CS: Daily Oriented Text Audio-Mandarin English-Code Switching Dataset

Yupei Li; Zifan Wei; Heng Yu; Jiahao Xue; Huichi Zhou; Bj\"orn W. Schuller

arXiv:2501.12122·cs.SD·November 14, 2025

DOTA-ME-CS: Daily Oriented Text Audio-Mandarin English-Code Switching Dataset

Yupei Li, Zifan Wei, Heng Yu, Jiahao Xue, Huichi Zhou, Bj\"orn W. Schuller

PDF

Open Access

TL;DR

This paper introduces DOTA-ME-CS, a comprehensive daily-oriented Mandarin-English code-switching speech dataset with AI-enhanced diversity, aiming to advance bilingual speech recognition research.

Contribution

The paper presents a new large-scale, diverse code-switching speech dataset with AI-augmented data, filling a gap in resources for bilingual ASR research.

Findings

01

Dataset contains 18.54 hours of audio from 34 participants.

02

AI techniques increase dataset diversity and complexity.

03

Dataset and code will be publicly available.

Abstract

Code-switching, the alternation between two or more languages within communication, poses great challenges for Automatic Speech Recognition (ASR) systems. Existing models and datasets are limited in their ability to effectively handle these challenges. To address this gap and foster progress in code-switching ASR research, we introduce the DOTA-ME-CS: Daily oriented text audio Mandarin-English code-switching dataset, which consists of 18.54 hours of audio data, including 9,300 recordings from 34 participants. To enhance the dataset's diversity, we apply artificial intelligence (AI) techniques such as AI timbre synthesis, speed variation, and noise addition, thereby increasing the complexity and scalability of the task. The dataset is carefully curated to ensure both diversity and quality, providing a robust resource for researchers addressing the intricacies of bilingual speech…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques