Integration of TensorFlow based Acoustic Model with Kaldi WFST Decoder

Minkyu Lim; Ji-Hwan Kim

arXiv:1906.11018·eess.AS·June 27, 2019·1 cites

Integration of TensorFlow based Acoustic Model with Kaldi WFST Decoder

Minkyu Lim, Ji-Hwan Kim

PDF

Open Access

TL;DR

This paper presents a method to integrate TensorFlow-based neural network acoustic models with Kaldi's WFST decoder, enabling flexible neural network architectures in speech recognition without performance loss.

Contribution

It introduces a novel integration approach allowing TensorFlow models to be used directly within Kaldi's WFST decoding framework, facilitating flexible neural network design in speech recognition.

Findings

01

TensorFlow acoustic models achieve comparable performance to Kaldi models.

02

Enables application of various neural network architectures in WFST-based recognition.

03

Supports online decoding with TensorFlow-based acoustic models.

Abstract

While the Kaldi framework provides state-of-the-art components for speech recognition like feature extraction, deep neural network (DNN)-based acoustic models, and a weighted finite state transducer (WFST)-based decoder, it is difficult to implement a new flexible DNN model. By contrast, a general-purpose deep learning framework, such as TensorFlow, can easily build various types of neural network architectures using a tensor-based computation method, but it is difficult to apply them to WFST-based speech recognition. In this study, a TensorFlow-based acoustic model is integrated with a WFST-based Kaldi decoder to combine the two frameworks. The features and alignments used in Kaldi are converted so they can be trained by the TensorFlow model, and the DNN-based acoustic model is then trained. In the integrated Kaldi decoder, the posterior probabilities are calculated by querying the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing