Bridging the Language Gap: Synthetic Voice Diversity via Latent Mixup for Equitable Speech Recognition

Wesley Bian; Xiaofeng Lin; Guang Cheng

arXiv:2511.20534·cs.CL·November 26, 2025

Bridging the Language Gap: Synthetic Voice Diversity via Latent Mixup for Equitable Speech Recognition

Wesley Bian, Xiaofeng Lin, Guang Cheng

PDF

Open Access

TL;DR

This paper presents a novel data augmentation method using latent mixup to improve speech recognition accuracy for low-resource languages, addressing data scarcity issues and promoting linguistic equity.

Contribution

The paper introduces a new latent mixup augmentation technique that enhances speech recognition models for underrepresented languages, outperforming existing methods.

Findings

01

Significant performance improvements on low-resource languages

02

Outperforms existing augmentation strategies

03

Practical solution for underrepresented linguistic communities

Abstract

Modern machine learning models for audio tasks often exhibit superior performance on English and other well-resourced languages, primarily due to the abundance of available training data. This disparity leads to an unfair performance gap for low-resource languages, where data collection is both challenging and costly. In this work, we introduce a novel data augmentation technique for speech corpora designed to mitigate this gap. Through comprehensive experiments, we demonstrate that our method significantly improves the performance of automatic speech recognition systems on low-resource languages. Furthermore, we show that our approach outperforms existing augmentation strategies, offering a practical solution for enhancing speech technology in underrepresented linguistic communities.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · ICT in Developing Communities · Speech and Audio Processing