ViFP: A Framework for Visual False Positive Detection to Enhance Reasoning Reliability in VLMs

Ben Zhang; LuLu Yu; Lei Gao; QuanJiang Guo; Jing Liu; Hui Gao

arXiv:2508.04201·cs.CV·November 6, 2025

ViFP: A Framework for Visual False Positive Detection to Enhance Reasoning Reliability in VLMs

Ben Zhang, LuLu Yu, Lei Gao, QuanJiang Guo, Jing Liu, Hui Gao

PDF

TL;DR

ViFP is a framework that detects and corrects false positive reasoning in vision-language models, improving their logical consistency and reliability without extensive data requirements.

Contribution

It introduces a novel approach for directly detecting and correcting false positives in VLM reasoning paths, enhancing reliability and accuracy.

Findings

01

Improves accuracy by up to 5.4% on A-OKVQA

02

Reduces false positive rate significantly

03

Outperforms previous state-of-the-art methods

Abstract

During reasoning in vision-language models (VLMs), false positive (FP) reasoning occurs when a model produces the correct answer but follows an incorrect reasoning path, resulting in undermined reasoning reliability. Existing approaches mainly rely on prompt engineering, knowledge distillation or reinforcement learning to improve reasoning reliability, both of which require large amounts of high-quality data and thus limit practical applicability. Few approaches have focused on directly detecting and correcting FPs. To address these issues, we propose ViFP, a framework for Visual False Positive Detection to Enhance Reasoning Reliability in VLMs. ViFP builds effective reasoning paths through multi-turn QA and dynamically analyzes the consistency of the reasoning path to identify potential FPs. It also introduces a targeted reasoning chain correction mechanism to modify FP reasoning,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.