Enhancing Neural Spoken Language Recognition: An Exploration with Multilingual Datasets
Or Haim Anidjar, Roi Yozevitch

TL;DR
This paper presents a novel multilingual neural spoken language recognition system that employs optimized Time Delay Neural Networks with a specialized pooling layer, achieving 97% accuracy across ten diverse languages.
Contribution
It introduces an improved TDN architecture with a funnel shape and extensive hyperparameter tuning, advancing multilingual speech recognition capabilities.
Findings
Achieved 97% language recognition accuracy.
Enhanced TDN architecture with additional layers and funnel structure.
Effective use of augmented data for training.
Abstract
In this research, we advanced a spoken language recognition system, moving beyond traditional feature vector-based models. Our improvements focused on effectively capturing language characteristics over extended periods using a specialized pooling layer. We utilized a broad dataset range from Common-Voice, targeting ten languages across Indo-European, Semitic, and East Asian families. The major innovation involved optimizing the architecture of Time Delay Neural Networks. We introduced additional layers and restructured these networks into a funnel shape, enhancing their ability to process complex linguistic patterns. A rigorous grid search determined the optimal settings for these networks, significantly boosting their efficiency in language pattern recognition from audio samples. The model underwent extensive training, including a phase with augmented data, to refine its capabilities.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
