ExKaldi-RT: A Real-Time Automatic Speech Recognition Extension Toolkit   of Kaldi

Yu Wang; Chee Siang Leow; Akio Kobayashi; Takehito Utsuro; Hiromitsu; Nishizaki

arXiv:2104.01384·eess.AS·August 10, 2021

ExKaldi-RT: A Real-Time Automatic Speech Recognition Extension Toolkit of Kaldi

Yu Wang, Chee Siang Leow, Akio Kobayashi, Takehito Utsuro, Hiromitsu, Nishizaki

PDF

Open Access 1 Repo

TL;DR

ExKaldi-RT is a Python-based real-time speech recognition toolkit built on Kaldi, enabling easy development of online ASR systems with neural network integration and demonstrating competitive performance on LibriSpeech.

Contribution

It introduces a Python-compatible extension of Kaldi for online ASR, facilitating research and development with neural network models in real-time recognition.

Findings

01

Achieved competitive real-time ASR performance on LibriSpeech

02

Provides an easy-to-use Python interface for Kaldi-based online recognition

03

Supports neural network-based signal processing and decoding models

Abstract

This paper describes the ExKaldi-RT online automatic speech recognition (ASR) toolkit that is implemented based on the Kaldi ASR toolkit and Python language. ExKaldi-RT provides tools for building online recognition pipelines. While similar tools are available built on Kaldi, a key feature of ExKaldi-RT that it works on Python, which has an easy-to-use interface that allows online ASR system developers to develop original research, such as by applying neural network-based signal processing and by decoding model trained with deep learning frameworks. We performed benchmark experiments on the minimum LibriSpeech corpus, and it showed that ExKaldi-RT could achieve competitive ASR performance in real-time recognition.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wangyu09/exkaldi-rt
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing