Speak & Spell: LLM-Driven Controllable Phonetic Error Augmentation for Robust Dialogue State Tracking
Jihyun Lee, Solee Im, Wonjun Lee, Gary Geunbae Lee

TL;DR
This paper presents a controllable phonetic error augmentation technique for dialogue state tracking that improves robustness against ASR errors by generating targeted, phonetically similar errors in key entities.
Contribution
It introduces a novel, controllable data augmentation method that enhances DST robustness by simulating realistic ASR errors on important entities.
Findings
Improved DST accuracy in noisy ASR conditions
Effective generation of phonetically similar errors
Controllable error placement using keyword prompts
Abstract
Dialogue State Tracking (DST) is a key part of task-oriented dialogue systems, identifying important information in conversations. However, its accuracy drops significantly in spoken dialogue environments due to named entity errors from Automatic Speech Recognition (ASR) systems. We introduce a simple yet effective data augmentation method that targets those entities to improve the robustness of DST model. Our novel method can control the placement of errors using keyword-highlighted prompts while introducing phonetically similar errors. As a result, our method generated sufficient error patterns on keywords, leading to improved accuracy in noised and low-accuracy ASR environments.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEEG and Brain-Computer Interfaces · Cognitive Functions and Memory · Context-Aware Activity Recognition Systems
MethodsDynamic Sparse Training
