Context-based out-of-vocabulary word recovery for ASR systems in Indian   languages

Arun Baby; Saranya Vinnaitherthan; Akhil Kerhalkar; Pranav Jawale,; Sharath Adavanne; Nagaraj Adiga

arXiv:2206.04305·eess.AS·June 10, 2022·1 cites

Context-based out-of-vocabulary word recovery for ASR systems in Indian languages

Arun Baby, Saranya Vinnaitherthan, Akhil Kerhalkar, Pranav Jawale,, Sharath Adavanne, Nagaraj Adiga

PDF

Open Access

TL;DR

This paper introduces a post-processing method for ASR systems that significantly improves the recovery of context-based out-of-vocabulary words in Indian languages by using a phonetic and acoustic knowledge-based cost function.

Contribution

It proposes a novel post-processing technique with a phonetic and acoustic cost function to recover OOV words, reducing the need for complex model retraining.

Findings

01

Recovers 50% of context-based OOV words on average

02

Effective at both word-level and sentence-level recovery

03

Enhances ASR performance without extensive model modifications

Abstract

Detecting and recovering out-of-vocabulary (OOV) words is always challenging for Automatic Speech Recognition (ASR) systems. Many existing methods focus on modeling OOV words by modifying acoustic and language models and integrating context words cleverly into models. To train such complex models, we need a large amount of data with context words, additional training time, and increased model size. However, after getting the ASR transcription to recover context-based OOV words, the post-processing method has not been explored much. In this work, we propose a post-processing technique to improve the performance of context-based OOV recovery. We created an acoustically boosted language model with a sub-graph made at phone level with an OOV words list. We proposed two methods to determine a suitable cost function to retrieve the OOV words based on the context. The cost function is defined…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing