Whisper Finetuning on Nepali Language

Sanjay Rijal; Shital Adhikari; Manish Dahal; Manish Awale; Vaghawan; Ojha

arXiv:2411.12587·cs.CL·November 20, 2024

Whisper Finetuning on Nepali Language

Sanjay Rijal, Shital Adhikari, Manish Dahal, Manish Awale, Vaghawan, Ojha

PDF

Open Access

TL;DR

This paper demonstrates that fine-tuning OpenAI's Whisper models on a diverse, curated Nepali speech dataset significantly improves transcription accuracy, highlighting the importance of data quality and augmentation for underrepresented languages.

Contribution

It introduces a comprehensive Nepali speech dataset and shows that fine-tuning Whisper models with this data reduces WER and enhances robustness for Nepali ASR.

Findings

01

WER reduced by up to 36.2% on small models

02

Data augmentation improves model robustness

03

Curated dataset outperforms baseline models

Abstract

Despite the growing advancements in Automatic Speech Recognition (ASR) models, the development of robust models for underrepresented languages, such as Nepali, remains a challenge. This research focuses on making an exhaustive and generalized dataset followed by fine-tuning OpenAI's Whisper models of different sizes to improve transcription (speech-to-text) accuracy for the Nepali language. We leverage publicly available ASR datasets and self-recorded custom datasets with a diverse range of accents, dialects, and speaking styles further enriched through augmentation. Our experimental results demonstrate that fine-tuning Whisper models on our curated custom dataset substantially reduces the Word Error Rate (WER) across all model sizes attributed to larger data variations in terms of speaker's age, gender, and sentiment, acoustic environment, dialect, denser audio segments (15-30 seconds)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection