Spatial Colour Mixing Illusions as a Perception Stress Test for Vision-Language Models

Nicoleta-Nina Basoc; Adrian Cosma; Emilian Radoi

arXiv:2603.06141·cs.CV·March 9, 2026

Spatial Colour Mixing Illusions as a Perception Stress Test for Vision-Language Models

Nicoleta-Nina Basoc, Adrian Cosma, Emilian Radoi

PDF

Open Access

TL;DR

This paper investigates the perceptual weaknesses of vision-language models under structured colour distortions, revealing significant accuracy drops and proposing perception-aware preprocessing to enhance robustness.

Contribution

It introduces a framework of spatial colour mixing distortions, evaluates VLMs' performance degradation, and suggests perception-aware preprocessing as a practical improvement strategy.

Findings

01

VLM accuracy sharply declines with increased colour distortions

02

Scaling language models does not reliably improve robustness

03

Perception-aware preprocessing recovers significant performance

Abstract

Vision-language models (VLMs) achieve strong benchmark results, yet can exhibit systematic perceptual weaknesses: structured, large changes to pixel values can cause confident yet nonsensical predictions, even when the underlying scene remains easily recognizable to humans. We study this gap using Spatial Colour Mixing, a programmatic family of colour distortions that overlays structured patterns (in both RGB and Ostwald colour systems) onto natural images. We introduce a framework of eight spatial colour mixing variants and evaluate nine VLMs across three model families on four datasets. Across models and datasets, accuracy degrades sharply with increasing distortion, and scaling the language model does not reliably mitigate the failure. In a human study with 61 participants on an animal recognition dataset, humans substantially outperform VLMs under the same distortions. Finally, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Categorization, perception, and language · Generative Adversarial Networks and Image Synthesis