Improving Speech Recognition for Indic Languages using Language Model

Ankur Dhuriya; Harveen Singh Chadha; Anirudh Gupta; Priyanshi Shah,; Neeraj Chhimwal; Rishabh Gaur; Vivek Raghavan

arXiv:2203.16595·cs.CL·June 16, 2022

Improving Speech Recognition for Indic Languages using Language Model

Ankur Dhuriya, Harveen Singh Chadha, Anirudh Gupta, Priyanshi Shah,, Neeraj Chhimwal, Rishabh Gaur, Vivek Raghavan

PDF

Open Access

TL;DR

This paper enhances Indic language speech recognition by fine-tuning wav2vec 2.0 models and applying language models, significantly reducing error rates and enabling domain-specific transcription without retraining.

Contribution

It introduces a method to improve Indic language ASR using fine-tuned wav2vec 2.0 and diverse language models, achieving substantial error reduction and domain adaptability.

Findings

01

CER reduced by over 28% with LM

02

WER reduced by about 36% with LM

03

Large LM not always better than diverse LM

Abstract

We study the effect of applying a language model (LM) on the output of Automatic Speech Recognition (ASR) systems for Indic languages. We fine-tune wav2vec $2.0$ models for $18$ Indic languages and adjust the results with language models trained on text derived from a variety of sources. Our findings demonstrate that the average Character Error Rate (CER) decreases by over $28$ \% and the average Word Error Rate (WER) decreases by about $36$ \% after decoding with LM. We show that a large LM may not provide a substantial improvement as compared to a diverse one. We also demonstrate that high quality transcriptions can be obtained on domain-specific data without retraining the ASR model and show results on biomedical domain.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Topic Modeling · Natural Language Processing Techniques