Diagnosing Causal Reasoning in Vision-Language Models via Structured Relevance Graphs

Dhita Putri Pratama; Soyeon Caren Han; Yihao Ding

arXiv:2602.20878·cs.AI·February 25, 2026

Diagnosing Causal Reasoning in Vision-Language Models via Structured Relevance Graphs

Dhita Putri Pratama, Soyeon Caren Han, Yihao Ding

PDF

Open Access

TL;DR

This paper introduces structured relevance graphs to diagnose and improve causal reasoning in vision-language models, revealing that structural guidance enhances reasoning more than capacity alone.

Contribution

We propose Vision-Language Causal Graphs (VLCGs) and a diagnostic benchmark ViLCaR to evaluate and enhance causal reasoning in LVLMs.

Findings

01

Injecting structured relevance improves attribution accuracy.

02

LVLMs' reasoning limitations are due to structural guidance deficits.

03

Structured evaluation reveals reasoning gaps beyond answer correctness.

Abstract

Large Vision-Language Models (LVLMs) achieve strong performance on visual question answering benchmarks, yet often rely on spurious correlations rather than genuine causal reasoning. Existing evaluations primarily assess the correctness of the answers, making it unclear whether failures arise from limited reasoning capability or from misidentifying causally relevant information. We introduce Vision-Language Causal Graphs (VLCGs), a structured, query-conditioned representation that explicitly encodes causally relevant objects, attributes, relations, and scene-grounded assumptions. Building on this representation, we present ViLCaR, a diagnostic benchmark comprising tasks for Causal Attribution, Causal Inference, and Question Answering, along with graph-aligned evaluation metrics that assess relevance identification beyond final answer accuracy. Experiments in state-of-the-art LVLMs show…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling