Integration of TensorFlow based Acoustic Model with Kaldi WFST Decoder
Minkyu Lim, Ji-Hwan Kim

TL;DR
This paper presents a method to integrate TensorFlow-based neural network acoustic models with Kaldi's WFST decoder, enabling flexible neural network architectures in speech recognition without performance loss.
Contribution
It introduces a novel integration approach allowing TensorFlow models to be used directly within Kaldi's WFST decoding framework, facilitating flexible neural network design in speech recognition.
Findings
TensorFlow acoustic models achieve comparable performance to Kaldi models.
Enables application of various neural network architectures in WFST-based recognition.
Supports online decoding with TensorFlow-based acoustic models.
Abstract
While the Kaldi framework provides state-of-the-art components for speech recognition like feature extraction, deep neural network (DNN)-based acoustic models, and a weighted finite state transducer (WFST)-based decoder, it is difficult to implement a new flexible DNN model. By contrast, a general-purpose deep learning framework, such as TensorFlow, can easily build various types of neural network architectures using a tensor-based computation method, but it is difficult to apply them to WFST-based speech recognition. In this study, a TensorFlow-based acoustic model is integrated with a WFST-based Kaldi decoder to combine the two frameworks. The features and alignments used in Kaldi are converted so they can be trained by the TensorFlow model, and the DNN-based acoustic model is then trained. In the integrated Kaldi decoder, the posterior probabilities are calculated by querying the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
