CoRGI: Verified Chain-of-Thought Reasoning with Post-hoc Visual Grounding

Shixin Yi; Lin Shang

arXiv:2508.00378·cs.AI·October 15, 2025

CoRGI: Verified Chain-of-Thought Reasoning with Post-hoc Visual Grounding

Shixin Yi, Lin Shang

PDF

Open Access

TL;DR

CoRGI enhances multimodal reasoning by verifying and grounding chain-of-thought explanations in visual evidence, significantly reducing hallucinations and improving answer accuracy and interpretability across multiple benchmarks and models.

Contribution

This paper introduces CoRGI, a novel post-hoc verification framework that grounds reasoning steps in visual evidence to improve trustworthiness of vision-language models.

Findings

01

Improves answer accuracy across five benchmarks.

02

Reduces hallucinations and unsupported claims.

03

Enhances interpretability and trustworthiness.

Abstract

Multimodal reasoning with vision-language models (VLMs) often suffers from hallucinations, as models tend to generate explanations after only a superficial inspection of the image. We present \textbf{CoRGI}(\textbf{C}hain \textbf{o}f \textbf{R}easoning with \textbf{G}rounded \textbf{I}nsights), a framework that enhances reasoning reliability through post-hoc verification of chain-of-thought outputs. Given a VLM-generated rationale, CoRGI decomposes it into step-wise statements, grounds each step in visual evidence, and filters or corrects unsupported claims before producing the final answer. Experiments on five challenging benchmark-VCR, ScienceQA, MMMU, MathVista, and HallusionBenc-demonstrate that CoRGI consistently improves both answer accuracy and explanation faithfulness across multiple VLM backbones, including Qwen-2.5VL, LLaVA-1.6, and Gemma3-12B. Beyond quantitative gains,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Generative Adversarial Networks and Image Synthesis