Towards Statistical Factuality Guarantee for Large Vision-Language   Models

Zhuohang Li; Chao Yan; Nicholas J. Jackson; Wendi Cui; Bo Li; Jiaxin; Zhang; Bradley A. Malin

arXiv:2502.20560·cs.LG·March 3, 2025

Towards Statistical Factuality Guarantee for Large Vision-Language Models

Zhuohang Li, Chao Yan, Nicholas J. Jackson, Wendi Cui, Bo Li, Jiaxin, Zhang, Bradley A. Malin

PDF

TL;DR

This paper introduces ConfLVLM, a conformal prediction framework that provides statistical guarantees to reduce hallucinations in large vision-language models, ensuring more reliable image-conditioned text generation.

Contribution

The paper proposes a novel conformal prediction-based method to verify and filter claims in LVLM outputs, offering finite-sample guarantees on factuality.

Findings

01

ConfLVLM reduces claim error rate from 87.8% to 10.0%.

02

Achieves 95.3% true positive rate in filtering unreliable claims.

03

Applicable to any black-box LVLM with various uncertainty measures.

Abstract

Advancements in Large Vision-Language Models (LVLMs) have demonstrated promising performance in a variety of vision-language tasks involving image-conditioned free-form text generation. However, growing concerns about hallucinations in LVLMs, where the generated text is inconsistent with the visual context, are becoming a major impediment to deploying these models in applications that demand guaranteed reliability. In this paper, we introduce a framework to address this challenge, ConfLVLM, which is grounded on conformal prediction to achieve finite-sample distribution-free statistical guarantees on the factuality of LVLM output. This framework treats an LVLM as a hypothesis generator, where each generated text detail (or claim) is considered an individual hypothesis. It then applies a statistical hypothesis testing procedure to verify each claim using efficient heuristic uncertainty…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.