Automatic Speech Recognition for the Ika Language
Uchenna Nzenwata, Daniel Ogbuigwe

TL;DR
This paper demonstrates that fine-tuning large multilingual pretrained speech models on limited Ika language data can produce effective ASR systems, highlighting both potential and challenges like overfitting.
Contribution
It shows the effectiveness of fine-tuning wav2vec 2.0 models for low-resource language ASR and compares different model sizes for performance.
Findings
Larger 1 billion parameter model outperforms smaller models.
Fine-tuning achieves WER of 0.5377 and CER of 0.2651 with just over 1 hour of data.
Overfitting observed due to limited training data.
Abstract
We present a cost-effective approach for developing Automatic Speech Recognition (ASR) models for low-resource languages like Ika. We fine-tune the pretrained wav2vec 2.0 Massively Multilingual Speech Models on a high-quality speech dataset compiled from New Testament Bible translations in Ika. Our results show that fine-tuning multilingual pretrained models achieves a Word Error Rate (WER) of 0.5377 and Character Error Rate (CER) of 0.2651 with just over 1 hour of training data. The larger 1 billion parameter model outperforms the smaller 300 million parameter model due to its greater complexity and ability to store richer speech representations. However, we observe overfitting to the small training dataset, reducing generalizability. Our findings demonstrate the potential of leveraging multilingual pretrained models for low-resource languages. Future work should focus on expanding the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems
MethodsFocus
