Phonetic Temporal Neural Model for Language Identification
Zhiyuan Tang, Dong Wang, Yixiang Chen, Lantian Li, Andrew Abel

TL;DR
This paper introduces a phonetic temporal neural model for language identification that leverages phonetic features from a DNN, significantly improving accuracy over traditional acoustic models, especially in challenging conditions.
Contribution
The paper proposes a novel LSTM-RNN based LID system using phonetic features, enhancing performance by incorporating richer phonetic information at the frame level.
Findings
Outperforms existing acoustic neural models
Outperforms i-vector approach on short and noisy utterances
Demonstrates effectiveness on Babel and AP16-OLR databases
Abstract
Deep neural models, particularly the LSTM-RNN model, have shown great potential for language identification (LID). However, the use of phonetic information has been largely overlooked by most existing neural LID methods, although this information has been used very successfully in conventional phonetic LID systems. We present a phonetic temporal neural model for LID, which is an LSTM-RNN LID system that accepts phonetic features produced by a phone-discriminative DNN as the input, rather than raw acoustic features. This new model is similar to traditional phonetic LID methods, but the phonetic knowledge here is much richer: it is at the frame level and involves compacted information of all phones. Our experiments conducted on the Babel database and the AP16-OLR database demonstrate that the temporal phonetic neural approach is very effective, and significantly outperforms existing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Music and Audio Processing
