Unsupervised Stemming based Language Model for Telugu Broadcast News Transcription
Mythili Sharan Pala, Parayitam Laxminarayana, A.V. Ramana

TL;DR
This paper introduces an unsupervised stemming-based language model for Telugu, improving speech recognition accuracy by effectively handling out-of-vocabulary words through novel morphological processing techniques.
Contribution
It proposes a new unsupervised method for Telugu language modeling, addressing OOV words and enhancing ASR performance using smoothing and interpolation techniques.
Findings
Witten-Bell and Kneser-Ney smoothing techniques outperform others
ASR accuracy improved by 0.76% with supervised stemming
ASR accuracy improved by 0.94% with unsupervised stemming
Abstract
In Indian Languages , native speakers are able to understand new words formed by either combining or modifying root words with tense and / or gender. Due to data insufficiency, Automatic Speech Recognition system (ASR) may not accommodate all the words in the language model irrespective of the size of the text corpus. It also becomes computationally challenging if the volume of the data increases exponentially due to morphological changes to the root word. In this paper a new unsupervised method is proposed for a Indian language: Telugu, based on the unsupervised method for Hindi, to generate the Out of Vocabulary (OOV) words in the language model. By using techniques like smoothing and interpolation of pre-processed data with supervised and unsupervised stemming, different issues in language model for Indian language: Telugu has been addressed. We observe that the smoothing techniques…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Natural Language Processing Techniques
