Deep Triphone Embedding Improves Phoneme Recognition

Mohit Yadav; Vivek Tyagi

arXiv:1710.07868·cs.SD·October 25, 2017·2 cites

Deep Triphone Embedding Improves Phoneme Recognition

Mohit Yadav, Vivek Tyagi

PDF

Open Access

TL;DR

This paper introduces Deep Triphone Embeddings (DTE), a novel neural network-based feature representation that enhances phoneme recognition accuracy by capturing contextual speech information more effectively.

Contribution

The paper proposes a new DTE method derived from DNN activations, improving phoneme recognition over traditional triphone systems.

Findings

01

DTE improves phoneme recognition accuracy by 2.11%.

02

DTE captures contextual speech features effectively.

03

The method outperforms existing triphone-based systems.

Abstract

In this paper, we present a novel Deep Triphone Embedding (DTE) representation derived from Deep Neural Network (DNN) to encapsulate the discriminative information present in the adjoining speech frames. DTEs are generated using a four hidden layer DNN with 3000 nodes in each hidden layer at the first-stage. This DNN is trained with the tied-triphone classification accuracy as an optimization criterion. Thereafter, we retain the activation vectors (3000) of the last hidden layer, for each speech MFCC frame, and perform dimension reduction to further obtain a 300 dimensional representation, which we termed as DTE. DTEs along with MFCC features are fed into a second-stage four hidden layer DNN, which is subsequently trained for the task of tied-triphone classification. Both DNNs are trained using tri-phone labels generated from a tied-state triphone HMM-GMM system, by performing a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing