Unsupervised Pre-Training for Vietnamese Automatic Speech Recognition in the HYKIST Project
Khai Le-Duc

TL;DR
This paper explores unsupervised pre-training methods for Vietnamese ASR in the medical domain, focusing on low-resource scenarios and comparing publicly available models with custom pre-trained models.
Contribution
It investigates the effectiveness of unsupervised pre-training and data strategies for Vietnamese medical speech recognition, a low-resource language task.
Findings
Unsupervised pre-training improves ASR performance in low-resource settings.
Customized pre-trained models outperform publicly available models.
Data combination strategies enhance recognition accuracy.
Abstract
In today's interconnected globe, moving abroad is more and more prevalent, whether it's for employment, refugee resettlement, or other causes. Language difficulties between natives and immigrants present a common issue on a daily basis, especially in medical domain. This can make it difficult for patients and doctors to communicate during anamnesis or in the emergency room, which compromises patient care. The goal of the HYKIST Project is to develop a speech translation system to support patient-doctor communication with ASR and MT. ASR systems have recently displayed astounding performance on particular tasks for which enough quantities of training data are available, such as LibriSpeech. Building a good model is still difficult due to a variety of speaking styles, acoustic and recording settings, and a lack of in-domain training data. In this thesis, we describe our efforts to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
