Contrast Sensitivity in Multimodal Large Language Models: A Psychophysics-Inspired Evaluation
Pablo Hern\'andez-C\'amara, Alexandra Gomez-Villa, Jose Manuel Ja\'en-Lorites, Jorge Vila-Tom\'as, Valero Laparra, Jesus Malo

TL;DR
This paper introduces a psychophysics-inspired method to evaluate the contrast sensitivity of multimodal large language models, revealing their perceptual limitations and variability in frequency tuning.
Contribution
It presents a novel behavioral approach to estimate contrast sensitivity functions in MLLMs, enabling systematic assessment of their low-level visual processing capabilities.
Findings
Some models resemble human CSFs in shape or scale
CSF estimates are highly sensitive to prompt phrasing
CSFs predict model performance under various conditions
Abstract
Understanding how Multimodal Large Language Models (MLLMs) process low-level visual features is critical for evaluating their perceptual abilities and has not been systematically characterized. Inspired by human psychophysics, we introduce a behavioural method for estimating the Contrast Sensitivity Function (CSF) in MLLMs by treating them as end-to-end observers. Models are queried with structured prompts while viewing noise-based stimuli filtered at specific spatial frequencies. Psychometric functions are derived from the binary verbal responses, and contrast thresholds (and CSFs) are obtained without relying on internal activations or classifier-based proxies. Our results reveal that some models resemble human CSFs in shape or scale, but none capture both. We also find that CSF estimates are highly sensitive to prompt phrasing, indicating limited linguistic robustness. Finally, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
