Moving Toward High Precision Dynamical Modelling in Hidden Markov Models

S\'ebastien Gagnon; Jean Rouat

arXiv:1607.00359·cs.CL·July 4, 2016

Moving Toward High Precision Dynamical Modelling in Hidden Markov Models

S\'ebastien Gagnon, Jean Rouat

PDF

Open Access

TL;DR

This paper advocates for the importance of finely-tuned HMM topologies to improve temporal modeling in speech recognition, demonstrating that pruning complex models can outperform traditional left-to-right structures.

Contribution

It introduces a proof-of-concept framework for learning efficient HMM topologies through pruning, enhancing the modeling of complex time dependencies.

Findings

01

Pruned complex models outperform classical left-to-right HMMs in speech recognition.

02

Fine-tuned topologies better capture temporal dependencies.

03

The approach shows promise for improving state-of-the-art HMM systems.

Abstract

Hidden Markov Model (HMM) is often regarded as the dynamical model of choice in many fields and applications. It is also at the heart of most state-of-the-art speech recognition systems since the 70's. However, from Gaussian mixture models HMMs (GMM-HMM) to deep neural network HMMs (DNN-HMM), the underlying Markovian chain of state-of-the-art models did not changed much. The "left-to-right" topology is mostly always employed because very few other alternatives exist. In this paper, we propose that finely-tuned HMM topologies are essential for precise temporal modelling and that this approach should be investigated in state-of-the-art HMM system. As such, we propose a proof-of-concept framework for learning efficient topologies by pruning down complex generic models. Speech recognition experiments that were conducted indicate that complex time dependencies can be better learned by this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Natural Language Processing Techniques