Fine-grained Language Identification with Multilingual CapsNet Model
Mudit Verma, Arun Balaji Buduru

TL;DR
This paper introduces a real-time, fine-grained spoken language identification system using a novel Multilingual CapsNet architecture operating on spectrograms, achieving 91.8% accuracy with minimal data and pre-processing.
Contribution
It proposes a new CapsNet-based model for language identification that outperforms previous RNN and iVector approaches in accuracy and efficiency.
Findings
Achieved 91.8% accuracy in 5-second audio clips.
CapsNet architecture effectively captures language features from spectrograms.
Minimal data and pre-processing required for high accuracy.
Abstract
Due to a drastic improvement in the quality of internet services worldwide, there is an explosion of multilingual content generation and consumption. This is especially prevalent in countries with large multilingual audience, who are increasingly consuming media outside their linguistic familiarity/preference. Hence, there is an increasing need for real-time and fine-grained content analysis services, including language identification, content transcription, and analysis. Accurate and fine-grained spoken language detection is an essential first step for all the subsequent content analysis algorithms. Current techniques in spoken language detection may lack on one of these fronts: accuracy, fine-grained detection, data requirements, manual effort in data collection \& pre-processing. Hence in this work, a real-time language detection approach to detect spoken language from 5 seconds'…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
