Knowing When to Answer: Adaptive Confidence Refinement for Reliable Audio-Visual Question Answering
Dinh Phu Tran, Jihoon Jeong, Saad Wazir, Seongah Kim, Thao Do, Cem Subakan, Daeyoung Kim

TL;DR
This paper introduces Adaptive Confidence Refinement (ACR), a novel method that improves the reliability of audio-visual question answering systems by effectively estimating when to abstain from answering, especially under uncertain conditions.
Contribution
The paper proposes ACR, a lightweight, input-adaptive method that enhances confidence estimation in AVQA models by combining MSP with learned residual corrections and trust gating, addressing calibration issues.
Findings
ACR outperforms existing confidence methods on multiple AVQA architectures.
ACR improves in- and out-of-distribution performance and reduces bias.
Theoretical analysis supports ACR's effectiveness in confidence calibration.
Abstract
We present a formal problem formulation for \textit{Reliable} Audio-Visual Question Answering (-AVQA), where we prefer abstention over answering incorrectly. While recent AVQA models have high accuracy, their ability to identify when they are likely wrong and their consequent abstention from answering remain underexplored areas of research. To fill this gap, we explore several approaches and then propose Adaptive Confidence Refinement (ACR), a lightweight method to further enhance the performance of -AVQA. Our key insight is that the Maximum Softmax Probability (MSP) is Bayes-optimal only under strong calibration, a condition usually not met in deep neural networks, particularly in multimodal models. Instead of replacing MSP, our ACR maintains it as a primary confidence signal and applies input-adaptive residual corrections when MSP is deemed unreliable. ACR…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling
