ManWav: The First Manchu ASR Model
Jean Seo, Minha Kang, Sungjoo Byun, Sangah Lee

TL;DR
This paper introduces ManWav, the first automatic speech recognition model for the endangered Manchu language, utilizing Wav2Vec2-XLSR-53 and data augmentation to improve recognition accuracy.
Contribution
It presents the first Manchu ASR model, demonstrating the effectiveness of fine-tuning Wav2Vec2-XLSR-53 with augmented data for low-resource languages.
Findings
Fine-tuning with augmented data reduces CER by 0.02
Fine-tuning with augmented data reduces WER by 0.13
First successful ASR model for Manchu language
Abstract
This study addresses the widening gap in Automatic Speech Recognition (ASR) research between high resource and extremely low resource languages, with a particular focus on Manchu, a critically endangered language. Manchu exemplifies the challenges faced by marginalized linguistic communities in accessing state-of-the-art technologies. In a pioneering effort, we introduce the first-ever Manchu ASR model ManWav, leveraging Wav2Vec2-XLSR-53. The results of the first Manchu ASR is promising, especially when trained with our augmented data. Wav2Vec2-XLSR-53 fine-tuned with augmented data demonstrates a 0.02 drop in CER and 0.13 drop in WER compared to the same base model fine-tuned with original data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Automated Systems
MethodsBalanced Selection · Focus
