Decompose and Compare Consistency: Measuring VLMs' Answer Reliability   via Task-Decomposition Consistency Comparison

Qian Yang; Weixiang Yan; Aishwarya Agrawal

arXiv:2407.07840·cs.CV·October 10, 2024

Decompose and Compare Consistency: Measuring VLMs' Answer Reliability via Task-Decomposition Consistency Comparison

Qian Yang, Weixiang Yan, Aishwarya Agrawal

PDF

Open Access

TL;DR

This paper introduces DeCC, a novel method for assessing the reliability of VLM responses by comparing direct answers with decomposed sub-question answers, improving accuracy correlation across multiple tasks.

Contribution

DeCC is a new approach that measures VLM answer reliability through consistency comparison between direct and decomposed question reasoning.

Findings

01

DeCC outperforms existing methods in correlating with task accuracy.

02

DeCC is effective across six vision-language tasks and three VLMs.

03

The method reduces overconfidence and confirmation bias issues.

Abstract

Despite tremendous advancements, current state-of-the-art Vision-Language Models (VLMs) are still far from perfect. They tend to hallucinate and may generate biased responses. In such circumstances, having a way to assess the reliability of a given response generated by a VLM is quite useful. Existing methods, such as estimating uncertainty using answer likelihoods or prompt-based confidence generation, often suffer from overconfidence. Other methods use self-consistency comparison but are affected by confirmation biases. To alleviate these, we propose Decompose and Compare Consistency (DeCC) for reliability measurement. By comparing the consistency between the direct answer generated using the VLM's internal reasoning process, and the indirect answers obtained by decomposing the question into sub-questions and reasoning over the sub-answers produced by the VLM, DeCC measures the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multi-Agent Systems and Negotiation · Speech and dialogue systems