Whisper Turns Stronger: Augmenting Wav2Vec 2.0 for Superior ASR in Low-Resource Languages
Or Haim Anidjar, Revital Marbel, Roi Yozevitch

TL;DR
This paper enhances Wav2Vec2-based speech recognition for low-resource languages by applying data augmentation, resulting in significant improvements over existing models like Whisper and baseline Wav2Vec2, especially across dialects and diacritics.
Contribution
It introduces a novel end-to-end framework that augments Wav2Vec2 fine-tuning with data augmentation techniques, improving ASR performance in low-resource, dialect-rich languages.
Findings
33.9% relative improvement in Word Error Rate
53.2% relative improvement in Character Error Rate
Robustness to different diacritics
Abstract
Approaching Speech-to-Text and Automatic Speech Recognition problems in low-resource languages is notoriously challenging due to the scarcity of validated datasets and the diversity of dialects. Arabic, Russian, and Portuguese exemplify these difficulties, being low-resource languages due to the many dialects of these languages across different continents worldwide. Moreover, the variety of accents and pronunciations of such languages complicate ASR models' success. With the increasing popularity of Deep Learning and Transformers, acoustic models like the renowned Wav2Vec2 have achieved superior performance in the Speech Recognition field compared to state-of-the-art approaches. However, despite Wav2Vec2's improved efficiency over traditional methods, its performance significantly declines for under-represented languages, even though it requires significantly less labeled data. This…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAccess Control and Trust · Service-Oriented Architecture and Web Services · Speech Recognition and Synthesis
