TL;DR
This paper investigates whether large vision-language models can simulate human color perception differences caused by color vision deficiencies, revealing that models lack mechanisms to represent alternative perceptual experiences, which impacts accessibility.
Contribution
The study introduces a novel evaluation of LVLMs' ability to model perceptual variations in color vision deficiencies using the Ishihara Test, highlighting current limitations.
Findings
Models understand factual color vision deficiency knowledge.
Models fail to replicate perceptual differences experienced by affected individuals.
Current systems lack mechanisms for representing alternative perceptual experiences.
Abstract
Large-scale Vision-Language Models (LVLMs) are being deployed in real-world settings that require visual inference. As capabilities improve, applications in navigation, education, and accessibility are becoming practical. These settings require accommodation of perceptual variation rather than assuming a uniform visual experience. Color perception illustrates this requirement: it is central to visual understanding yet varies across individuals due to Color Vision Deficiencies, an aspect largely ignored in multimodal AI. In this work, we examine whether LVLMs can account for variation in color perception using the Ishihara Test. We evaluate model behavior through generation, confidence, and internal representation, using Ishihara plates as controlled stimuli that expose perceptual differences. Although models possess factual knowledge about color vision deficiencies and can describe the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
