Model Adaptation for ASR in low-resource Indian Languages
Abhayjeet Singh, Arjun Singh Mehta, Ashish Khuraishi K S, Deekshitha, G, Gauri Date, Jai Nanavati, Jesuraja Bandekar, Karnalius Basumatary,, Karthika P, Sandhya Badiger, Sathvik Udupa, Saurabh Kumar, Savitha, Prasanta, Kumar Ghosh, Prashanthi V, Priyanka Pai, Raoul Nanavati

TL;DR
This paper explores adaptation techniques for improving automatic speech recognition in low-resource Indian languages by leveraging similarities among languages and modalities, with experiments on Bengali and Bhojpuri.
Contribution
It investigates the relative importance of acoustic and textual data in low-resource language ASR and proposes adaptation strategies utilizing related languages and shared features.
Findings
Shared linguistic features aid in model adaptation.
Acoustic data can compensate for limited text resources.
Methodology applicable to other low-resource languages.
Abstract
Automatic speech recognition (ASR) performance has improved drastically in recent years, mainly enabled by self-supervised learning (SSL) based acoustic models such as wav2vec2 and large-scale multi-lingual training like Whisper. A huge challenge still exists for low-resource languages where the availability of both audio and text is limited. This is further complicated by the presence of multiple dialects like in Indian languages. However, many Indian languages can be grouped into the same families and share the same script and grammatical structure. This is where a lot of adaptation and fine-tuning techniques can be applied to overcome the low-resource nature of the data by utilising well-resourced similar languages. In such scenarios, it is important to understand the extent to which each modality, like acoustics and text, is important in building a reliable ASR. It could be the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
