Adversarial synthesis based data-augmentation for code-switched spoken language identification
Parth Shastri, Chirag Patil, Poorval Wanere, Shrinivas Mahajan,, Abhishek Bhatt, Hardik Sailor

TL;DR
This paper introduces a GAN-based data augmentation method using Mel spectrograms to improve spoken language identification for code-switched Hindi-English speech, addressing data scarcity issues.
Contribution
It proposes a novel GAN-based augmentation technique for code-mixed speech data, enhancing LID accuracy in multilingual scenarios.
Findings
GAN augmentation improves Unweighted Average Recall by 3.5%.
Method effectively models minority class data distribution.
Enhances spoken language identification in code-switched speech.
Abstract
Spoken Language Identification (LID) is an important sub-task of Automatic Speech Recognition(ASR) that is used to classify the language(s) in an audio segment. Automatic LID plays an useful role in multilingual countries. In various countries, identifying a language becomes hard, due to the multilingual scenario where two or more than two languages are mixed together during conversation. Such phenomenon of speech is called as code-mixing or code-switching. This nature is followed not only in India but also in many Asian countries. Such code-mixed data is hard to find, which further reduces the capabilities of the spoken LID. Hence, this work primarily addresses this problem using data augmentation as a solution on the on the data scarcity of the code-switched class. This study focuses on Indic language code-mixed with English. Spoken LID is performed on Hindi, code-mixed with English.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing
