CLSRIL-23: Cross Lingual Speech Representations for Indic Languages
Anirudh Gupta, Harveen Singh Chadha, Priyanshi Shah, Neeraj Chhimwal,, Ankur Dhuriya, Rishabh Gaur, Vivek Raghavan

TL;DR
CLSRIL-23 is a self-supervised, multilingual speech model trained on 23 Indic languages that improves speech recognition performance, especially for low-resource languages, by learning shared phonetic representations.
Contribution
The paper introduces CLSRIL-23, a novel self-supervised model for cross-lingual speech representation learning across 23 Indic languages, leveraging multilingual pretraining to enhance downstream speech recognition tasks.
Findings
Multilingual pretraining outperforms monolingual training.
5% reduction in WER for Hindi speech recognition.
9.5% reduction in CER for Hindi speech recognition.
Abstract
We present a CLSRIL-23, a self supervised learning based audio pre-trained model which learns cross lingual speech representations from raw audio across 23 Indic languages. It is built on top of wav2vec 2.0 which is solved by training a contrastive task over masked latent speech representations and jointly learns the quantization of latents shared across all languages. We compare the language wise loss during pretraining to compare effects of monolingual and multilingual pretraining. Performance on some downstream fine-tuning tasks for speech recognition is also compared and our experiments show that multilingual pretraining outperforms monolingual training, in terms of learning speech representations which encodes phonetic similarity of languages and also in terms of performance on down stream tasks. A decrease of 5% is observed in WER and 9.5% in CER when a multilingual pretrained…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗Harveenchadha/vakyansh-wav2vec2-hindi-him-4200model· 653 dl· ♡ 5653 dl♡ 5
- 🤗Harveenchadha/vakyansh-wav2vec2-punjabi-pam-10model· 29 dl· ♡ 229 dl♡ 2
- 🤗Harveenchadha/vakyansh-wav2vec2-tamil-tam-250model· 76k dl· ♡ 476k dl♡ 4
- 🤗Harveenchadha/vakyansh_hindi_base_pretrainedmodel· 23 dl· ♡ 123 dl♡ 1
- 🤗Harveenchadha/wav2vec2-pretrained-clsril-23-10kmodel· 72 dl· ♡ 672 dl♡ 6
- 🤗nikhilanvekar2001/Hindi_asr_with_LMmodel· 2 dl2 dl
- 🤗nikhilanvekar2001/Hindi_asrmodel· 2 dl2 dl
- 🤗nikhilanvekar2001/Hindi_asr_5gram_modelmodel· 3 dl3 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing
