Diagnosing Vision Language Models' Perception by Leveraging Human Methods for Color Vision Deficiencies

Kazuki Hayashi; Shintaro Ozaki; Yusuke Sakai; Hidetaka Kamigaito; Taro Watanabe

arXiv:2505.17461·cs.CV·January 29, 2026

Diagnosing Vision Language Models' Perception by Leveraging Human Methods for Color Vision Deficiencies

Kazuki Hayashi, Shintaro Ozaki, Yusuke Sakai, Hidetaka Kamigaito, Taro Watanabe

PDF

1 Video

TL;DR

This paper investigates whether large vision-language models can simulate human color perception differences caused by color vision deficiencies, revealing that models lack mechanisms to represent alternative perceptual experiences, which impacts accessibility.

Contribution

The study introduces a novel evaluation of LVLMs' ability to model perceptual variations in color vision deficiencies using the Ishihara Test, highlighting current limitations.

Findings

01

Models understand factual color vision deficiency knowledge.

02

Models fail to replicate perceptual differences experienced by affected individuals.

03

Current systems lack mechanisms for representing alternative perceptual experiences.

Abstract

Large-scale Vision-Language Models (LVLMs) are being deployed in real-world settings that require visual inference. As capabilities improve, applications in navigation, education, and accessibility are becoming practical. These settings require accommodation of perceptual variation rather than assuming a uniform visual experience. Color perception illustrates this requirement: it is central to visual understanding yet varies across individuals due to Color Vision Deficiencies, an aspect largely ignored in multimodal AI. In this work, we examine whether LVLMs can account for variation in color perception using the Ishihara Test. We evaluate model behavior through generation, confidence, and internal representation, using Ishihara plates as controlled stimuli that expose perceptual differences. Although models possess factual knowledge about color vision deficiencies and can describe the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Diagnosing Vision Language Models' Perception by Leveraging Human Methods for Color Vision Deficiencies· underline