Decompose and Compare Consistency: Measuring VLMs' Answer Reliability via Task-Decomposition Consistency Comparison
Qian Yang, Weixiang Yan, Aishwarya Agrawal

TL;DR
This paper introduces DeCC, a novel method for assessing the reliability of VLM responses by comparing direct answers with decomposed sub-question answers, improving accuracy correlation across multiple tasks.
Contribution
DeCC is a new approach that measures VLM answer reliability through consistency comparison between direct and decomposed question reasoning.
Findings
DeCC outperforms existing methods in correlating with task accuracy.
DeCC is effective across six vision-language tasks and three VLMs.
The method reduces overconfidence and confirmation bias issues.
Abstract
Despite tremendous advancements, current state-of-the-art Vision-Language Models (VLMs) are still far from perfect. They tend to hallucinate and may generate biased responses. In such circumstances, having a way to assess the reliability of a given response generated by a VLM is quite useful. Existing methods, such as estimating uncertainty using answer likelihoods or prompt-based confidence generation, often suffer from overconfidence. Other methods use self-consistency comparison but are affected by confirmation biases. To alleviate these, we propose Decompose and Compare Consistency (DeCC) for reliability measurement. By comparing the consistency between the direct answer generated using the VLM's internal reasoning process, and the indirect answers obtained by decomposing the question into sub-questions and reasoning over the sub-answers produced by the VLM, DeCC measures the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multi-Agent Systems and Negotiation · Speech and dialogue systems
