Few-Shot Accent Synthesis for ASR with LLM-Guided Phoneme Editing

Yurii Halychanskyi; Nimet Beyza Bozdag; Mark Hasegawa-Johnson; Dilek Hakkani-T\"ur; Volodymyr Kindratenko

arXiv:2604.27273·cs.SD·May 1, 2026

Few-Shot Accent Synthesis for ASR with LLM-Guided Phoneme Editing

Yurii Halychanskyi, Nimet Beyza Bozdag, Mark Hasegawa-Johnson, Dilek Hakkani-T\"ur, Volodymyr Kindratenko

PDF

TL;DR

This paper introduces a low-resource accent adaptation pipeline for ASR that uses minimal reference speech and LLM-guided phoneme editing to generate synthetic data, improving recognition accuracy.

Contribution

It presents a novel approach combining TTS adaptation with LLM-guided phoneme editing for effective accent modeling in extremely low-resource scenarios.

Findings

01

Consistent WER reductions on real accented speech.

02

LLM-guided phoneme edits outperform random perturbations.

03

Effective in ultra-low data regimes.

Abstract

Accented automatic speech recognition (ASR) often degrades due to the limited availability of accented training data. Prior work has explored accent modeling in low-resource settings, but existing approaches typically require minutes to hours of labeled speech, which may still be impractical for truly scarce accent scenarios. We propose a pipeline that adapts a text-to-speech (TTS) decoder to a target-accent speaker using fewer than ten reference utterances and employs large language model (LLM)-based phoneme editing to generate accent-conditioned pronunciations. The resulting synthetic speech is used to fine-tune a self-supervised ASR model. Experiments demonstrate consistent word error rate (WER) reductions on real accented speech, including cross-speaker evaluation and ultra-low data regimes. A matched-rate random phoneme baseline shows that phoneme-space perturbation itself is a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.