When to Call an Apple Red: Humans Follow Introspective Rules, VLMs Don't
Jonathan Nemitz, Carsten Eickhoff, Junyi Jessy Li, Kyle Mahowald, Michal Golovanevsky, William Rudman

TL;DR
This paper introduces the GCA dataset to evaluate whether vision-language models and humans follow their own reasoning rules, revealing that models often violate their introspective rules while humans generally adhere to theirs.
Contribution
The study presents a new benchmark dataset, GCA, and provides empirical evidence that VLMs miscalibrate their self-knowledge, unlike humans who remain faithful to their rules.
Findings
VLMs violate their own rules in nearly 60% of cases.
Humans mostly follow their stated rules, with violations due to overestimating coverage.
World-knowledge priors reduce faithfulness in VLMs, unlike in humans.
Abstract
Understanding when Vision-Language Models (VLMs) will behave unexpectedly, whether models can reliably predict their own behavior, and if models adhere to their introspective reasoning are central challenges for trustworthy deployment. To study this, we introduce the Graded Color Attribution (GCA) dataset, a controlled benchmark designed to elicit decision rules and evaluate participant faithfulness to these rules. GCA consists of line drawings that vary pixel-level color coverage across three conditions: world-knowledge recolorings, counterfactual recolorings, and shapes with no color priors. Using GCA, both VLMs and human participants establish a threshold: the minimum percentage of pixels of a given color an object must have to receive that color label. We then compare these rules with their subsequent color attribution decisions. Our findings reveal that models systematically…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
