Bridging the Language Gap: Synthetic Voice Diversity via Latent Mixup for Equitable Speech Recognition
Wesley Bian, Xiaofeng Lin, Guang Cheng

TL;DR
This paper presents a novel data augmentation method using latent mixup to improve speech recognition accuracy for low-resource languages, addressing data scarcity issues and promoting linguistic equity.
Contribution
The paper introduces a new latent mixup augmentation technique that enhances speech recognition models for underrepresented languages, outperforming existing methods.
Findings
Significant performance improvements on low-resource languages
Outperforms existing augmentation strategies
Practical solution for underrepresented linguistic communities
Abstract
Modern machine learning models for audio tasks often exhibit superior performance on English and other well-resourced languages, primarily due to the abundance of available training data. This disparity leads to an unfair performance gap for low-resource languages, where data collection is both challenging and costly. In this work, we introduce a novel data augmentation technique for speech corpora designed to mitigate this gap. Through comprehensive experiments, we demonstrate that our method significantly improves the performance of automatic speech recognition systems on low-resource languages. Furthermore, we show that our approach outperforms existing augmentation strategies, offering a practical solution for enhancing speech technology in underrepresented linguistic communities.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · ICT in Developing Communities · Speech and Audio Processing
