Model Adaptation for ASR in low-resource Indian Languages

Abhayjeet Singh; Arjun Singh Mehta; Ashish Khuraishi K S; Deekshitha; G; Gauri Date; Jai Nanavati; Jesuraja Bandekar; Karnalius Basumatary,; Karthika P; Sandhya Badiger; Sathvik Udupa; Saurabh Kumar; Savitha; Prasanta; Kumar Ghosh; Prashanthi V; Priyanka Pai; Raoul Nanavati; Rohan Saxena; Sai; Praneeth Reddy Mora; Srinivasa Raghavan

arXiv:2307.07948·eess.AS·July 18, 2023·5 cites

Model Adaptation for ASR in low-resource Indian Languages

Abhayjeet Singh, Arjun Singh Mehta, Ashish Khuraishi K S, Deekshitha, G, Gauri Date, Jai Nanavati, Jesuraja Bandekar, Karnalius Basumatary,, Karthika P, Sandhya Badiger, Sathvik Udupa, Saurabh Kumar, Savitha, Prasanta, Kumar Ghosh, Prashanthi V, Priyanka Pai, Raoul Nanavati

PDF

Open Access

TL;DR

This paper explores adaptation techniques for improving automatic speech recognition in low-resource Indian languages by leveraging similarities among languages and modalities, with experiments on Bengali and Bhojpuri.

Contribution

It investigates the relative importance of acoustic and textual data in low-resource language ASR and proposes adaptation strategies utilizing related languages and shared features.

Findings

01

Shared linguistic features aid in model adaptation.

02

Acoustic data can compensate for limited text resources.

03

Methodology applicable to other low-resource languages.

Abstract

Automatic speech recognition (ASR) performance has improved drastically in recent years, mainly enabled by self-supervised learning (SSL) based acoustic models such as wav2vec2 and large-scale multi-lingual training like Whisper. A huge challenge still exists for low-resource languages where the availability of both audio and text is limited. This is further complicated by the presence of multiple dialects like in Indian languages. However, many Indian languages can be grouped into the same families and share the same script and grammatical structure. This is where a lot of adaptation and fine-tuning techniques can be applied to overcome the low-resource nature of the data by utilising well-resourced similar languages. In such scenarios, it is important to understand the extent to which each modality, like acoustics and text, is important in building a reliable ASR. It could be the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing