TL;DR
This study investigates how neural spoken language identification models for Slavic languages perform across different acoustic domains, revealing the impact of domain mismatch and demonstrating the effectiveness of unsupervised domain adaptation techniques.
Contribution
It provides a comprehensive analysis of domain mismatch effects on neural LID systems for Slavic languages and introduces unsupervised domain adaptation to improve robustness.
Findings
Out-of-domain speech samples significantly reduce LID accuracy.
Spectral features are more robust than cepstral features under domain mismatch.
Unsupervised domain adaptation improves accuracy by up to 77%.
Abstract
State-of-the-art spoken language identification (LID) systems, which are based on end-to-end deep neural networks, have shown remarkable success not only in discriminating between distant languages but also between closely-related languages or even different spoken varieties of the same language. However, it is still unclear to what extent neural LID models generalize to speech samples with different acoustic conditions due to domain shift. In this paper, we present a set of experiments to investigate the impact of domain mismatch on the performance of neural LID systems for a subset of six Slavic languages across two domains (read speech and radio broadcast) and examine two low-level signal descriptors (spectral and cepstral features) for this task. Our experiments show that (1) out-of-domain speech samples severely hinder the performance of neural LID models, and (2) while both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
