Counterfactual Segmentation Reasoning: Diagnosing and Mitigating Pixel-Grounding Hallucination

Xinzhuo Li; Adheesh Juvekar; Jiaxun Zhang; Xingyou Liu; Muntasir Wahed; Kiet A. Nguyen; Yifan Shen; Tianjiao Yu; Ismini Lourentzou

arXiv:2506.21546·cs.CV·April 24, 2026

Counterfactual Segmentation Reasoning: Diagnosing and Mitigating Pixel-Grounding Hallucination

Xinzhuo Li, Adheesh Juvekar, Jiaxun Zhang, Xingyou Liu, Muntasir Wahed, Kiet A. Nguyen, Yifan Shen, Tianjiao Yu, Ismini Lourentzou

PDF

TL;DR

This paper introduces a new benchmark and method for diagnosing and reducing hallucinations in segmentation vision-language models through counterfactual reasoning and fine-tuning.

Contribution

It formalizes the task of Counterfactual Segmentation Reasoning, creates the HalluSegBench benchmark, and proposes RobustSeg with counterfactual fine-tuning to mitigate hallucinations.

Findings

01

RobustSeg reduces hallucinations by 30%.

02

HalluSegBench enables diagnosis of vision-driven hallucinations.

03

Counterfactual fine-tuning improves segmentation performance.

Abstract

Segmentation Vision-Language Models (VLMs) have significantly advanced grounded visual understanding, yet they remain prone to pixel-grounding hallucinations, producing masks for incorrect objects or for objects that are entirely absent. Existing evaluations rely almost entirely on text- or label-based perturbations, which check only whether the predicted mask matches the queried label. Such evaluations overlook the spatial footprint and severity of hallucination and therefore fail to reveal vision-driven hallucinations, which are more challenging and more prevalent. To address this gap, we formalize the task of Counterfactual Segmentation Reasoning (CSR), where a model must segment the referenced object in the factual image and abstain in its counterfactual counterpart. To support this task, we curate HalluSegBench, the first large-scale benchmark to diagnose referring and reasoning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.