When Visual Evidence is Ambiguous: Pareidolia as a Diagnostic Probe for Vision Models
Qianpu Chen, Derya Soydaner, Rob Saunders

TL;DR
This paper investigates how different vision models interpret ambiguous face-like patterns, revealing distinct mechanisms of interpretation and biases, and proposes a diagnostic framework to analyze their behavior under ambiguity.
Contribution
It introduces a unified diagnostic framework for analyzing vision models' responses to ambiguous face pareidolia, revealing model-specific mechanisms and biases.
Findings
VLMs show semantic overactivation, overcalling face-like patterns.
ViT employs uncertainty-based abstention, remaining unbiased.
Detection models suppress pareidolia through conservative priors.
Abstract
When visual evidence is ambiguous, vision models must decide whether to interpret face-like patterns as meaningful. Face pareidolia, the perception of faces in non-face objects, provides a controlled probe of this behavior. We introduce a representation-level diagnostic framework that analyzes detection, localization, uncertainty, and bias across class, difficulty, and emotion in face pareidolia images. Under a unified protocol, we evaluate six models spanning four representational regimes: vision-language models (VLMs; CLIP-B/32, CLIP-L/14, LLaVA-1.5-7B), pure vision classification (ViT), general object detection (YOLOv8), and face detection (RetinaFace). Our analysis reveals three mechanisms of interpretation under ambiguity. VLMs exhibit semantic overactivation, systematically pulling ambiguous non-human regions toward the Human concept, with LLaVA-1.5-7B producing the strongest and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace Recognition and Perception · Face recognition and analysis · Visual Attention and Saliency Detection
