Hindi-English Code-Switching Speech Corpus
Ganji Sreeram, Kunal Dhawan, Rohit Sinha

TL;DR
This paper introduces a new Hindi-English code-switching speech corpus designed to facilitate the development of automatic speech recognition systems and other speech processing applications in multilingual Indian contexts.
Contribution
It presents the creation and analysis of a comprehensive code-switching speech database covering diverse speaker and session variations, a resource previously limited in availability.
Findings
The corpus contains diverse speech samples with various speaker attributes.
Baseline ASR performance results demonstrate the corpus's utility.
The database supports multiple speech processing tasks like language identification and synthesis.
Abstract
Code-switching refers to the usage of two languages within a sentence or discourse. It is a global phenomenon among multilingual communities and has emerged as an independent area of research. With the increasing demand for the code-switching automatic speech recognition (ASR) systems, the development of a code-switching speech corpus has become highly desirable. However, for training such systems, very limited code-switched resources are available as yet. In this work, we present our first efforts in building a code-switching ASR system in the Indian context. For that purpose, we have created a Hindi-English code-switching speech database. The database not only contains the speech utterances with code-switching properties but also covers the session and the speaker variations like pronunciation, accent, age, gender, etc. This database can be applied in several speech signal processing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and dialogue systems
