A Self-Refining Framework for Enhancing ASR Using TTS-Synthesized Data

Cheng-Kang Chou; Chan-Jan Hsu; Ho-Lam Chung; Liang-Hsuan Tseng; Hsi-Chun Cheng; Yu-Kuan Fu; Kuan Po Huang; Hung-Yi Lee

arXiv:2506.11130·cs.CL·June 17, 2025

A Self-Refining Framework for Enhancing ASR Using TTS-Synthesized Data

Cheng-Kang Chou, Chan-Jan Hsu, Ho-Lam Chung, Liang-Hsuan Tseng, Hsi-Chun Cheng, Yu-Kuan Fu, Kuan Po Huang, Hung-Yi Lee

PDF

Open Access 5 Models

TL;DR

This paper introduces a self-refining framework that improves automatic speech recognition (ASR) by iteratively generating pseudo-labels, synthesizing speech with TTS, and retraining the model, specifically demonstrated on Taiwanese Mandarin with significant error reduction.

Contribution

The paper presents a novel self-refining cycle combining pseudo-labeling and TTS synthesis to enhance ASR performance without labeled data, tailored for low-resource languages.

Findings

01

Achieved up to 20% error reduction on Mandarin ASR benchmarks.

02

Reduced error rates by 50% on Mandarin-English code-switching tasks.

03

Demonstrated effectiveness with 6,000 hours of unlabeled speech data.

Abstract

We propose a self-refining framework that enhances ASR performance with only unlabeled datasets. The process starts with an existing ASR model generating pseudo-labels on unannotated speech, which are then used to train a high-fidelity text-to-speech (TTS) system. Then, synthesized speech text pairs are bootstrapped into the original ASR system, completing the closed-loop self-improvement cycle. We demonstrated the effectiveness of the framework on Taiwanese Mandarin speech. Leveraging 6,000 hours of unlabeled speech, a moderate amount of text data, and synthetic content from the AI models, we adapt Whisper-large-v2 into a specialized model, Twister. Twister reduces error rates by up to 20% on Mandarin and 50% on Mandarin-English code-switching benchmarks compared to Whisper. Results highlight the framework as a compelling alternative to pseudo-labeling self-distillation approaches and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFault Detection and Control Systems