Cross-Corpora Language Recognition: A Preliminary Investigation with Indian Languages
Spandan Dey, Goutam Saha, Md Sahidullah

TL;DR
This study evaluates the performance of spoken language identification systems across different Indian language corpora, highlighting significant performance drops due to corpus mismatch and demonstrating that feature normalization can improve cross-corpora accuracy.
Contribution
First investigation into cross-corpora evaluation for Indian spoken language identification, analyzing mismatch issues and applying feature normalization to enhance performance.
Findings
Cross-corpora performance degrades significantly.
Feature normalization improves cross-corpora LID accuracy.
Significant differences in LTAS and SNR among corpora.
Abstract
In this paper, we conduct one of the very first studies for cross-corpora performance evaluation in the spoken language identification (LID) problem. Cross-corpora evaluation was not explored much in LID research, especially for the Indian languages. We have selected three Indian spoken language corpora: IIITH-ILSC, LDC South Asian, and IITKGP-MLILSC. For each of the corpus, LID systems are trained on the state-of-the-art time-delay neural network (TDNN) based architecture with MFCC features. We observe that the LID performance degrades drastically for cross-corpora evaluation. For example, the system trained on the IIITH-ILSC corpus shows an average EER of 11.80 % and 43.34 % when evaluated with the same corpora and LDC South Asian corpora, respectively. Our preliminary analysis shows the significant differences among these corpora in terms of mismatch in the long-term average spectrum…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
