Analyzing Error Propagation in Korean Spoken QA with ASR-LLM Cascades
Donghyuk Jung, Youngwon Choi

TL;DR
This paper investigates how errors from automatic speech recognition affect downstream Korean spoken question answering, revealing key failure channels and potential benefits of direct audio models.
Contribution
It identifies the impact of Korean ASR errors on semantic failures and demonstrates the advantages of large audio language models over traditional ASR-LLM pipelines.
Findings
ASR errors cause consistent downstream degradation across LLMs.
Single-character Korean ASR errors can lead to complete answer omission.
Large audio language models outperform ASR-LLM pipelines in noisy conditions.
Abstract
We analyze how automatic speech recognition (ASR) errors propagate through ASR-LLM cascades in Korean spoken question answering (SQA), focusing on downstream semantic failures that conventional ASR metrics cannot fully capture. Our analysis shows that the relative downstream degradation caused by ASR errors is consistent across LLMs with different absolute performance, suggesting that cascade degradation largely tracks ASR-stage information loss. We further identify single-character Korean ASR errors as a distinct semantic-failure channel, where the gold answer becomes entirely absent from the downstream prediction despite only a minimal transcription difference. Finally, an auxiliary comparison shows that a large audio language model outperforms an ASR-LLM pipeline with a matched language backbone in noisy Korean SQA, indicating the potential of direct audio input to mitigate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
