Walking Through Uncertainty: An Empirical Study of Uncertainty Estimation for Audio-Aware Large Language Models

Chun-Yi Kuan; Wei-Ping Huang; Hung-yi Lee

arXiv:2604.25591·eess.AS·April 29, 2026

Walking Through Uncertainty: An Empirical Study of Uncertainty Estimation for Audio-Aware Large Language Models

Chun-Yi Kuan, Wei-Ping Huang, Hung-yi Lee

PDF

TL;DR

This paper conducts the first comprehensive empirical evaluation of uncertainty estimation methods for audio-aware large language models, highlighting their strengths and limitations across various tasks.

Contribution

It benchmarks five uncertainty estimation methods for ALLMs, revealing their relative effectiveness and dependencies on models and evaluation scenarios.

Findings

01

Semantic and verification-based methods outperform token-level baselines in reasoning tasks.

02

Uncertainty method effectiveness varies significantly across different benchmarks and models.

03

Adaptive inference based on uncertainty shows potential for improving reliability.

Abstract

Recent audio-aware large language models (ALLMs) have demonstrated strong capabilities across diverse audio understanding and reasoning tasks, but they still frequently produce hallucinated or overly confident outputs. While uncertainty estimation has been extensively studied in text-only LLMs, it remains largely unexplored for ALLMs, where audio-conditioned generation introduces additional challenges such as perceptual ambiguity and cross-modal grounding. In this work, we present the first systematic empirical study of uncertainty estimation in ALLMs. We benchmark five representative methods, including predictive entropy, length-normalized entropy, semantic entropy, discrete semantic entropy, and P(True), across multiple models and diverse evaluation settings spanning general audio understanding, reasoning, hallucination detection, and unanswerable question answering. Our results…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.