Refine and Align: Confidence Calibration through Multi-Agent Interaction in VQA
Ayush Pandey, Jai Bardhan, Ishita Jain, Ramya S Hebbalaguppe, Rohan Raju Dhanakshirur, Lovekesh Vig

TL;DR
This paper introduces AlignVQA, a multi-agent debate framework for VQA that improves confidence calibration, making AI answers more reliable by critiquing, refining, and aggregating diverse model responses, and fine-tuning with a new calibration loss.
Contribution
The paper proposes a novel multi-agent debate framework and a calibration-aware loss function to enhance confidence calibration in VQA systems, especially under visual uncertainty.
Findings
Improved confidence calibration across multiple VQA benchmarks.
Specialized agents produce better aligned confidence estimates.
Debate-based aggregation enhances answer reliability.
Abstract
In the context of Visual Question Answering (VQA) and Agentic AI, calibration refers to how closely an AI system's confidence in its answers reflects their actual correctness. This aspect becomes especially important when such systems operate autonomously and must make decisions under visual uncertainty. While modern VQA systems, powered by advanced vision-language models (VLMs), are increasingly used in high-stakes domains like medical diagnostics and autonomous navigation due to their improved accuracy, the reliability of their confidence estimates remains under-examined. Particularly, these systems often produce overconfident responses. To address this, we introduce AlignVQA, a debate-based multi-agent framework, in which diverse specialized VLM -- each following distinct prompting strategies -- generate candidate answers and then engage in two-stage interaction: generalist agents…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Topic Modeling
