ViFP: A Framework for Visual False Positive Detection to Enhance Reasoning Reliability in VLMs
Ben Zhang, LuLu Yu, Lei Gao, QuanJiang Guo, Jing Liu, Hui Gao

TL;DR
ViFP is a framework that detects and corrects false positive reasoning in vision-language models, improving their logical consistency and reliability without extensive data requirements.
Contribution
It introduces a novel approach for directly detecting and correcting false positives in VLM reasoning paths, enhancing reliability and accuracy.
Findings
Improves accuracy by up to 5.4% on A-OKVQA
Reduces false positive rate significantly
Outperforms previous state-of-the-art methods
Abstract
During reasoning in vision-language models (VLMs), false positive (FP) reasoning occurs when a model produces the correct answer but follows an incorrect reasoning path, resulting in undermined reasoning reliability. Existing approaches mainly rely on prompt engineering, knowledge distillation or reinforcement learning to improve reasoning reliability, both of which require large amounts of high-quality data and thus limit practical applicability. Few approaches have focused on directly detecting and correcting FPs. To address these issues, we propose ViFP, a framework for Visual False Positive Detection to Enhance Reasoning Reliability in VLMs. ViFP builds effective reasoning paths through multi-turn QA and dynamically analyzes the consistency of the reasoning path to identify potential FPs. It also introduces a targeted reasoning chain correction mechanism to modify FP reasoning,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
