Fine-grained Language Identification with Multilingual CapsNet Model

Mudit Verma; Arun Balaji Buduru

arXiv:2007.06078·eess.AS·July 14, 2020

Fine-grained Language Identification with Multilingual CapsNet Model

Mudit Verma, Arun Balaji Buduru

PDF

TL;DR

This paper introduces a real-time, fine-grained spoken language identification system using a novel Multilingual CapsNet architecture operating on spectrograms, achieving 91.8% accuracy with minimal data and pre-processing.

Contribution

It proposes a new CapsNet-based model for language identification that outperforms previous RNN and iVector approaches in accuracy and efficiency.

Findings

01

Achieved 91.8% accuracy in 5-second audio clips.

02

CapsNet architecture effectively captures language features from spectrograms.

03

Minimal data and pre-processing required for high accuracy.

Abstract

Due to a drastic improvement in the quality of internet services worldwide, there is an explosion of multilingual content generation and consumption. This is especially prevalent in countries with large multilingual audience, who are increasingly consuming media outside their linguistic familiarity/preference. Hence, there is an increasing need for real-time and fine-grained content analysis services, including language identification, content transcription, and analysis. Accurate and fine-grained spoken language detection is an essential first step for all the subsequent content analysis algorithms. Current techniques in spoken language detection may lack on one of these fronts: accuracy, fine-grained detection, data requirements, manual effort in data collection \& pre-processing. Hence in this work, a real-time language detection approach to detect spoken language from 5 seconds'…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.