Pardon? Evaluating Conversational Repair in Large Audio-Language Models

Shuanghong Huang; Jinlei Xu; Youchao Zhou; Yanghao Zhou; Xuan Zhao; Chong Feng; Wenxuan Zhang

arXiv:2601.12973·cs.CL·January 21, 2026

Pardon? Evaluating Conversational Repair in Large Audio-Language Models

Shuanghong Huang, Jinlei Xu, Youchao Zhou, Yanghao Zhou, Xuan Zhao, Chong Feng, Wenxuan Zhang

PDF

Open Access

TL;DR

This paper introduces a new evaluation framework for Large Audio-Language Models that assesses their ability to recognize unanswerable inputs and perform conversational repairs, highlighting limitations of current accuracy-focused metrics.

Contribution

It proposes the EAR score, a novel metric for evaluating both answerability and repair behavior, and demonstrates its effectiveness through experiments on spoken QA benchmarks.

Findings

01

Models perform well on answerable inputs but struggle with unanswerability detection.

02

Current metrics overlook the importance of conversational repair in real-world interactions.

03

The study reveals a gap between answer accuracy and conversational reliability in LALMs.

Abstract

Large Audio-Language Models (LALMs) have demonstrated strong performance in spoken question answering (QA), with existing evaluations primarily focusing on answer accuracy and robustness to acoustic perturbations. However, such evaluations implicitly assume that spoken inputs remain semantically answerable, an assumption that often fails in real-world interaction when essential information is missing. In this work, we introduce a repair-aware evaluation setting that explicitly distinguishes between answerable and unanswerable audio inputs. We define answerability as a property of the input itself and construct paired evaluation conditions using a semantic-acoustic masking protocol. Based on this setting, we propose the Evaluability Awareness and Repair (EAR) score, a non-compensatory metric that jointly evaluates task competence under answerable conditions and repair behavior under…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Topic Modeling · Speech and dialogue systems