A baseline model for computationally inexpensive speech recognition for   Kazakh using the Coqui STT framework

Ilnar Salimzianov

arXiv:2107.10637·eess.AS·November 30, 2021

A baseline model for computationally inexpensive speech recognition for Kazakh using the Coqui STT framework

Ilnar Salimzianov

PDF

Open Access

TL;DR

This paper presents a new, computationally inexpensive Kazakh speech recognition baseline model using the Coqui STT framework, aiming for efficient inference on commodity devices without GPUs.

Contribution

It introduces a lightweight Kazakh ASR model and language models optimized for real-time use on standard hardware, addressing limitations of existing high-accuracy systems.

Findings

01

Achieved promising initial results with the new models

02

Identified need for further training and optimization

03

Demonstrated feasibility of low-resource speech recognition for Kazakh

Abstract

Mobile devices are transforming the way people interact with computers, and speech interfaces to applications are ever more important. Automatic Speech Recognition systems recently published are very accurate, but often require powerful machinery (specialised Graphical Processing Units) for inference, which makes them impractical to run on commodity devices, especially in streaming mode. Impressed by the accuracy of, but dissatisfied with the inference times of the baseline Kazakh ASR model of (Khassanov et al.,2021) when not using a GPU, we trained a new baseline acoustic model (on the same dataset as the aforementioned paper) and three language models for use with the Coqui STT framework. Results look promising, but further epochs of training and parameter sweeping or, alternatively, limiting the vocabulary that the ASR system must support, is needed to reach a production-level…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Topic Modeling