HiACC: Hinglish adult & children code-switched corpus
Shruti Singh, Muskaan Singh, Virender Kadyan

TL;DR
The paper introduces HiACC, a new Hinglish code-switched speech corpus for improving ASR systems, especially for children and adults in India.
Contribution
The paper presents the first publicly available code-switched Hinglish speech corpus with recordings from both adults and children.
Findings
HiACC includes 3,318 adult and 1,858 children audio segments with detailed annotations.
Baseline ASR models show a 42% increase in WER on code-switched speech compared to monolingual input.
The corpus is publicly available for research at the provided Zenodo link.
Abstract
Code-switching is the frequent alternation between two or more languages within a single utterance and is a widespread phenomenon among bilingual and multilingual speakers. In India, more than 250 million people are estimated to engage in code-switched communication, especially blending English with Hindi (Hinglish), making it one of the largest bilingual populations globally, making challenging for developing accurate and robust Automatic Speech Recognition (ASR) systems. Existing ASR models, typically trained on monolingual corpus, struggle with code-switched input due to a lack of large, balanced, and representative datasets—particularly for diverse age groups. Recent evaluations have shown that ASR models experience a relative increase in Word Error Rate (WER) of 30–50 % when exposed to code-switched speech compared to monolingual input. To address this resource gap, we introduce a…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Phonetics and Phonology Research
