Towards Relevance and Sequence Modeling in Language Recognition
Bharat Padi, Anand Mohan, Sriram Ganapathy

TL;DR
This paper introduces a neural network framework with relevance weighting for language recognition that effectively models sequence information, improving accuracy especially in noisy and multi-speaker scenarios.
Contribution
It proposes a novel relevance-weighted neural model using BLSTM with attention for language recognition, incorporating sequence information in a way that outperforms traditional methods.
Findings
Significant improvements over i-vector/x-vector approaches.
Effective in noisy and multi-speaker conditions.
Demonstrated on NIST LRE 2017 and RATS datasets.
Abstract
The task of automatic language identification (LID) involving multiple dialects of the same language family in the presence of noise is a challenging problem. In these scenarios, the identity of the language/dialect may be reliably present only in parts of the temporal sequence of the speech signal. The conventional approaches to LID (and for speaker recognition) ignore the sequence information by extracting long-term statistical summary of the recording assuming an independence of the feature frames. In this paper, we propose a neural network framework utilizing short-sequence information in language recognition. In particular, a new model is proposed for incorporating relevance in language recognition, where parts of speech data are weighted more based on their relevance for the language recognition task. This relevance weighting is achieved using the bidirectional long short-term…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Music and Audio Processing
