Synthetic Data Domain Adaptation for ASR via LLM-based Text and Phonetic Respelling Augmentation

Natsuo Yamashita; Koichi Nagatsuka; Hiroaki Kokubo; Kota Dohi; Tuan Vu Ho

arXiv:2603.16920·eess.AS·March 19, 2026

Synthetic Data Domain Adaptation for ASR via LLM-based Text and Phonetic Respelling Augmentation

Natsuo Yamashita, Koichi Nagatsuka, Hiroaki Kokubo, Kota Dohi, Tuan Vu Ho

PDF

Open Access

TL;DR

This paper introduces a novel synthetic data augmentation framework for domain adaptation in end-to-end ASR, combining LLM-based text augmentation with phonetic respelling to improve robustness on domain-specific data.

Contribution

It presents a new phonetic respelling augmentation method and an LLM-based text augmentation pipeline for better domain adaptation in ASR.

Findings

01

Consistent WER reductions across four datasets.

02

Enhanced lexical diversity and pronunciation variability.

03

Improved robustness of ASR models on domain-specific data.

Abstract

End-to-end automatic speech recognition often degrades on domain-specific data due to scarce in-domain resources. We propose a synthetic-data-based domain adaptation framework with two contributions: (1) a large language model (LLM)-based text augmentation pipeline with a filtering strategy that balances lexical diversity, perplexity, and domain-term coverage, and (2) phonetic respelling augmentation (PRA), a novel method that introduces pronunciation variability through LLM-generated orthographic pseudo-spellings. Unlike conventional acoustic-level methods such as SpecAugment, PRA provides phonetic diversity before speech synthesis, enabling synthetic speech to better approximate real-world variability. Experimental results across four domain-specific datasets demonstrate consistent reductions in word error rate, confirming that combining domain-specific lexical coverage with realistic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Phonetics and Phonology Research · Speech and Audio Processing