Spoken Language Identification System for English-Mandarin   Code-Switching Child-Directed Speech

Shashi Kant Gupta; Sushant Hiray; Prashant Kukde

arXiv:2306.00736·eess.AS·October 5, 2023·1 cites

Spoken Language Identification System for English-Mandarin Code-Switching Child-Directed Speech

Shashi Kant Gupta, Sushant Hiray, Prashant Kukde

PDF

Open Access 1 Repo

TL;DR

This paper presents a two-stage Encoder-Decoder E2E model for robust language identification in challenging speech conditions, achieving significant error rate improvements on Singaporean child-directed code-switched speech.

Contribution

Introduces a lightweight Encoder-Decoder model tailored for non-standard, accented, and spontaneous code-switched speech, with curated datasets for public use.

Findings

01

Achieved 15.6% EER in closed track

02

Achieved 11.1% EER in open track

03

Curated and released additional Singaporean speech data

Abstract

This work focuses on improving the Spoken Language Identification (LangId) system for a challenge that focuses on developing robust language identification systems that are reliable for non-standard, accented (Singaporean accent), spontaneous code-switched, and child-directed speech collected via Zoom. We propose a two-stage Encoder-Decoder-based E2E model. The encoder module consists of 1D depth-wise separable convolutions with Squeeze-and-Excitation (SE) layers with a global context. The decoder module uses an attentive temporal pooling mechanism to get fixed length time-independent feature representation. The total number of parameters in the model is around 22.1 M, which is relatively light compared to using some large-scale pre-trained speech models. We achieved an EER of 15.6% in the closed track and 11.1% in the open track (baseline system 22.1%). We also curated additional…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shashikg/lid-code-switching
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing