PhoWhisper: Automatic Speech Recognition for Vietnamese
Thanh-Thien Le, Linh The Nguyen, Dat Quoc Nguyen

TL;DR
PhoWhisper is a Vietnamese ASR system based on fine-tuning Whisper, achieving state-of-the-art results on benchmark datasets by leveraging a diverse 844-hour Vietnamese speech corpus.
Contribution
It introduces PhoWhisper, a new Vietnamese ASR model fine-tuned from Whisper with a large diverse dataset, and demonstrates its superior performance.
Findings
State-of-the-art accuracy on Vietnamese ASR benchmarks
Robustness across diverse Vietnamese accents
Open-source implementation available
Abstract
We introduce PhoWhisper in five versions for Vietnamese automatic speech recognition. PhoWhisper's robustness is achieved through fine-tuning the Whisper model on an 844-hour dataset that encompasses diverse Vietnamese accents. Our experimental study demonstrates state-of-the-art performances of PhoWhisper on benchmark Vietnamese ASR datasets. We have open-sourced PhoWhisper at: https://github.com/VinAIResearch/PhoWhisper
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Natural Language Processing Techniques
