ExKaldi-RT: A Real-Time Automatic Speech Recognition Extension Toolkit of Kaldi
Yu Wang, Chee Siang Leow, Akio Kobayashi, Takehito Utsuro, Hiromitsu, Nishizaki

TL;DR
ExKaldi-RT is a Python-based real-time speech recognition toolkit built on Kaldi, enabling easy development of online ASR systems with neural network integration and demonstrating competitive performance on LibriSpeech.
Contribution
It introduces a Python-compatible extension of Kaldi for online ASR, facilitating research and development with neural network models in real-time recognition.
Findings
Achieved competitive real-time ASR performance on LibriSpeech
Provides an easy-to-use Python interface for Kaldi-based online recognition
Supports neural network-based signal processing and decoding models
Abstract
This paper describes the ExKaldi-RT online automatic speech recognition (ASR) toolkit that is implemented based on the Kaldi ASR toolkit and Python language. ExKaldi-RT provides tools for building online recognition pipelines. While similar tools are available built on Kaldi, a key feature of ExKaldi-RT that it works on Python, which has an easy-to-use interface that allows online ASR system developers to develop original research, such as by applying neural network-based signal processing and by decoding model trained with deep learning frameworks. We performed benchmark experiments on the minimum LibriSpeech corpus, and it showed that ExKaldi-RT could achieve competitive ASR performance in real-time recognition.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
