PhoWhisper: Automatic Speech Recognition for Vietnamese

Thanh-Thien Le; Linh The Nguyen; Dat Quoc Nguyen

arXiv:2406.02555·eess.AS·June 6, 2024·1 cites

PhoWhisper: Automatic Speech Recognition for Vietnamese

Thanh-Thien Le, Linh The Nguyen, Dat Quoc Nguyen

PDF

Open Access 1 Repo

TL;DR

PhoWhisper is a Vietnamese ASR system based on fine-tuning Whisper, achieving state-of-the-art results on benchmark datasets by leveraging a diverse 844-hour Vietnamese speech corpus.

Contribution

It introduces PhoWhisper, a new Vietnamese ASR model fine-tuned from Whisper with a large diverse dataset, and demonstrates its superior performance.

Findings

01

State-of-the-art accuracy on Vietnamese ASR benchmarks

02

Robustness across diverse Vietnamese accents

03

Open-source implementation available

Abstract

We introduce PhoWhisper in five versions for Vietnamese automatic speech recognition. PhoWhisper's robustness is achieved through fine-tuning the Whisper model on an 844-hour dataset that encompasses diverse Vietnamese accents. Our experimental study demonstrates state-of-the-art performances of PhoWhisper on benchmark Vietnamese ASR datasets. We have open-sourced PhoWhisper at: https://github.com/VinAIResearch/PhoWhisper

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vinairesearch/phowhisper
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Natural Language Processing Techniques