Optimising The Input Window Alignment in CD-DNN Based Phoneme   Recognition for Low Latency Processing

Akash Kumar Dhaka; Giampiero Salvi

arXiv:1606.09163·cs.CL·June 30, 2016

Optimising The Input Window Alignment in CD-DNN Based Phoneme Recognition for Low Latency Processing

Akash Kumar Dhaka, Giampiero Salvi

PDF

Open Access

TL;DR

This paper investigates how shifting the input feature window asymmetrically affects the performance and latency of CD-DNN based phoneme recognisers, finding that a window with more past frames reduces latency without degrading accuracy.

Contribution

It introduces a systematic analysis of asymmetric input windows in phoneme recognition, demonstrating potential latency reductions while maintaining performance.

Findings

01

Performance remains stable with up to 5 frames of past shift.

02

Asymmetric window with 8 past and 2 future frames yields best results.

03

Latency can be reduced by approximately 50 ms without accuracy loss.

Abstract

We present a systematic analysis on the performance of a phonetic recogniser when the window of input features is not symmetric with respect to the current frame. The recogniser is based on Context Dependent Deep Neural Networks (CD-DNNs) and Hidden Markov Models (HMMs). The objective is to reduce the latency of the system by reducing the number of future feature frames required to estimate the current output. Our tests performed on the TIMIT database show that the performance does not degrade when the input window is shifted up to 5 frames in the past compared to common practice (no future frame). This corresponds to improving the latency by 50 ms in our settings. Our tests also show that the best results are not obtained with the symmetric window commonly employed, but with an asymmetric window with eight past and two future context frames, although this observation should be…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing