Speak & Spell: LLM-Driven Controllable Phonetic Error Augmentation for Robust Dialogue State Tracking

Jihyun Lee; Solee Im; Wonjun Lee; Gary Geunbae Lee

arXiv:2409.06263·cs.CL·October 31, 2025

Speak & Spell: LLM-Driven Controllable Phonetic Error Augmentation for Robust Dialogue State Tracking

Jihyun Lee, Solee Im, Wonjun Lee, Gary Geunbae Lee

PDF

Open Access

TL;DR

This paper presents a controllable phonetic error augmentation technique for dialogue state tracking that improves robustness against ASR errors by generating targeted, phonetically similar errors in key entities.

Contribution

It introduces a novel, controllable data augmentation method that enhances DST robustness by simulating realistic ASR errors on important entities.

Findings

01

Improved DST accuracy in noisy ASR conditions

02

Effective generation of phonetically similar errors

03

Controllable error placement using keyword prompts

Abstract

Dialogue State Tracking (DST) is a key part of task-oriented dialogue systems, identifying important information in conversations. However, its accuracy drops significantly in spoken dialogue environments due to named entity errors from Automatic Speech Recognition (ASR) systems. We introduce a simple yet effective data augmentation method that targets those entities to improve the robustness of DST model. Our novel method can control the placement of errors using keyword-highlighted prompts while introducing phonetically similar errors. As a result, our method generated sufficient error patterns on keywords, leading to improved accuracy in noised and low-accuracy ASR environments.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEEG and Brain-Computer Interfaces · Cognitive Functions and Memory · Context-Aware Activity Recognition Systems

MethodsDynamic Sparse Training