Full-Sum Decoding for Hybrid HMM based Speech Recognition using LSTM Language Model
Wei Zhou, Ralf Schl\"uter, Hermann Ney

TL;DR
This paper introduces a full-sum decoding approach for hybrid HMM speech recognition with LSTM language models, showing consistent improvements over traditional methods by considering all possible state sequences.
Contribution
It proposes a full-sum decoding method that replaces Viterbi approximation, leveraging more accurate probability calculations for improved speech recognition performance.
Findings
Consistent performance improvements on Switchboard and Librispeech datasets.
Effective across different training criteria and decoding strategies.
No additional computational cost compared to baseline methods.
Abstract
In hybrid HMM based speech recognition, LSTM language models have been widely applied and achieved large improvements. The theoretical capability of modeling any unlimited context suggests that no recombination should be applied in decoding. This motivates to reconsider full summation over the HMM-state sequences instead of Viterbi approximation in decoding. We explore the potential gain from more accurate probabilities in terms of decision making and apply the full-sum decoding with a modified prefix-tree search framework. The proposed full-sum decoding is evaluated on both Switchboard and Librispeech corpora. Different models using CE and sMBR training criteria are used. Additionally, both MAP and confusion network decoding as approximated variants of general Bayes decision rule are evaluated. Consistent improvements over strong baselines are achieved in almost all cases without extra…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and dialogue systems
