Phonetic Temporal Neural Model for Language Identification

Zhiyuan Tang; Dong Wang; Yixiang Chen; Lantian Li; Andrew Abel

arXiv:1705.03151·cs.CL·August 28, 2017·2 cites

Phonetic Temporal Neural Model for Language Identification

Zhiyuan Tang, Dong Wang, Yixiang Chen, Lantian Li, Andrew Abel

PDF

Open Access

TL;DR

This paper introduces a phonetic temporal neural model for language identification that leverages phonetic features from a DNN, significantly improving accuracy over traditional acoustic models, especially in challenging conditions.

Contribution

The paper proposes a novel LSTM-RNN based LID system using phonetic features, enhancing performance by incorporating richer phonetic information at the frame level.

Findings

01

Outperforms existing acoustic neural models

02

Outperforms i-vector approach on short and noisy utterances

03

Demonstrates effectiveness on Babel and AP16-OLR databases

Abstract

Deep neural models, particularly the LSTM-RNN model, have shown great potential for language identification (LID). However, the use of phonetic information has been largely overlooked by most existing neural LID methods, although this information has been used very successfully in conventional phonetic LID systems. We present a phonetic temporal neural model for LID, which is an LSTM-RNN LID system that accepts phonetic features produced by a phone-discriminative DNN as the input, rather than raw acoustic features. This new model is similar to traditional phonetic LID methods, but the phonetic knowledge here is much richer: it is at the frame level and involves compacted information of all phones. Our experiments conducted on the Babel database and the AP16-OLR database demonstrate that the temporal phonetic neural approach is very effective, and significantly outperforms existing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Music and Audio Processing