Adversarial synthesis based data-augmentation for code-switched spoken   language identification

Parth Shastri; Chirag Patil; Poorval Wanere; Shrinivas Mahajan,; Abhishek Bhatt; Hardik Sailor

arXiv:2205.15747·eess.AS·September 2, 2024

Adversarial synthesis based data-augmentation for code-switched spoken language identification

Parth Shastri, Chirag Patil, Poorval Wanere, Shrinivas Mahajan,, Abhishek Bhatt, Hardik Sailor

PDF

Open Access

TL;DR

This paper introduces a GAN-based data augmentation method using Mel spectrograms to improve spoken language identification for code-switched Hindi-English speech, addressing data scarcity issues.

Contribution

It proposes a novel GAN-based augmentation technique for code-mixed speech data, enhancing LID accuracy in multilingual scenarios.

Findings

01

GAN augmentation improves Unweighted Average Recall by 3.5%.

02

Method effectively models minority class data distribution.

03

Enhances spoken language identification in code-switched speech.

Abstract

Spoken Language Identification (LID) is an important sub-task of Automatic Speech Recognition(ASR) that is used to classify the language(s) in an audio segment. Automatic LID plays an useful role in multilingual countries. In various countries, identifying a language becomes hard, due to the multilingual scenario where two or more than two languages are mixed together during conversation. Such phenomenon of speech is called as code-mixing or code-switching. This nature is followed not only in India but also in many Asian countries. Such code-mixed data is hard to find, which further reduces the capabilities of the spoken LID. Hence, this work primarily addresses this problem using data augmentation as a solution on the on the data scarcity of the code-switched class. This study focuses on Indic language code-mixed with English. Spoken LID is performed on Hindi, code-mixed with English.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing