Kaldi+PDNN: Building DNN-based ASR Systems with Kaldi and PDNN
Yajie Miao

TL;DR
This paper presents open-source recipes for integrating deep neural networks with the Kaldi ASR toolkit, enabling the construction of various DNN-based speech recognition systems with ease.
Contribution
It introduces practical recipes combining Kaldi and PDNN for building DNN, CNN, and bottleneck feature ASR systems, facilitating easier adaptation to new datasets.
Findings
Successfully built DNN hybrid systems using Kaldi and PDNN
Enabled easy adaptation of recipes to different datasets
Demonstrated effectiveness on Switchboard 110-hour setup
Abstract
The Kaldi toolkit is becoming popular for constructing automated speech recognition (ASR) systems. Meanwhile, in recent years, deep neural networks (DNNs) have shown state-of-the-art performance on various ASR tasks. This document describes our open-source recipes to implement fully-fledged DNN acoustic modeling using Kaldi and PDNN. PDNN is a lightweight deep learning toolkit developed under the Theano environment. Using these recipes, we can build up multiple systems including DNN hybrid systems, convolutional neural network (CNN) systems and bottleneck feature systems. These recipes are directly based on the Kaldi Switchboard 110-hour setup. However, adapting them to new datasets is easy to achieve.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing
