Cantonese Automatic Speech Recognition Using Transfer Learning from Mandarin
Bryan Li, Xinyue Wang, Homayoon Beigi

TL;DR
This paper introduces a transfer learning approach from Mandarin to Cantonese for automatic speech recognition, enabling faster training and improved accuracy in low-resource Cantonese ASR systems.
Contribution
The study demonstrates effective transfer learning techniques for Cantonese ASR using Mandarin models, reducing training time and improving recognition accuracy.
Findings
Transfer learning reduces training time for Cantonese ASR.
Transfer models achieve lower log-probability per epoch.
Slight CER improvements observed with transfer learning.
Abstract
We propose a system to develop a basic automatic speech recognizer(ASR) for Cantonese, a low-resource language, through transfer learning of Mandarin, a high-resource language. We take a time-delayed neural network trained on Mandarin, and perform weight transfer of several layers to a newly initialized model for Cantonese. We experiment with the number of layers transferred, their learning rates, and pretraining i-vectors. Key findings are that this approach allows for quicker training time with less data. We find that for every epoch, log-probability is smaller for transfer learning models compared to a Cantonese-only model. The transfer learning models show slight improvement in CER.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and Audio Processing
