ManWav: The First Manchu ASR Model

Jean Seo; Minha Kang; Sungjoo Byun; Sangah Lee

arXiv:2406.13502·cs.CL·June 21, 2024

ManWav: The First Manchu ASR Model

Jean Seo, Minha Kang, Sungjoo Byun, Sangah Lee

PDF

Open Access 1 Models

TL;DR

This paper introduces ManWav, the first automatic speech recognition model for the endangered Manchu language, utilizing Wav2Vec2-XLSR-53 and data augmentation to improve recognition accuracy.

Contribution

It presents the first Manchu ASR model, demonstrating the effectiveness of fine-tuning Wav2Vec2-XLSR-53 with augmented data for low-resource languages.

Findings

01

Fine-tuning with augmented data reduces CER by 0.02

02

Fine-tuning with augmented data reduces WER by 0.13

03

First successful ASR model for Manchu language

Abstract

This study addresses the widening gap in Automatic Speech Recognition (ASR) research between high resource and extremely low resource languages, with a particular focus on Manchu, a critically endangered language. Manchu exemplifies the challenges faced by marginalized linguistic communities in accessing state-of-the-art technologies. In a pioneering effort, we introduce the first-ever Manchu ASR model ManWav, leveraging Wav2Vec2-XLSR-53. The results of the first Manchu ASR is promising, especially when trained with our augmented data. Wav2Vec2-XLSR-53 fine-tuned with augmented data demonstrates a 0.02 drop in CER and 0.13 drop in WER compared to the same base model fine-tuned with original data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
seemdog/ManWav
model· 5 dl· ♡ 3
5 dl♡ 3

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotics and Automated Systems

MethodsBalanced Selection · Focus