Mirror, Mirror on the Wall: Can VLM Agents Tell Who They Are at All?
Filippo Ziliotto, Ciro Beneduce, Bruno Lepri, Luciano Serafini, Massimiliano Luca, Tommaso Campari

TL;DR
This paper investigates whether embodied vision-language models can recognize themselves in mirrors, introducing a benchmark to assess self-identification and grounding in perception and action.
Contribution
The study presents a controlled 3D benchmark to evaluate mirror-based self-identification in VLM agents, revealing emergence mainly in stronger models and highlighting the importance of perception-grounded self-awareness.
Findings
Stronger VLMs can use reflected evidence for self-identification.
Weaker models often fail to extract self-relevant information or misattribute reflections.
Mirror-based evaluation distinguishes perception-grounded self-awareness from priors or prompts.
Abstract
In the animal kingdom, mirror self-recognition is a canonical probe of higher-order cognition, emerging only in some species. We ask whether an analogous functional capability emerges in embodied vision-language model (VLM) agents: can they recognize themselves in a mirror? We introduce a controlled 3D benchmark where a first-person VLM agent must infer a hidden body attribute from its reflection and select the matching target, while avoiding self-other misattribution. To separate mirror-grounded self-identification from shortcuts, we test mirror removal, misleading cues, and occluded reflections. We also evaluate the decision process through mirror seeking, temporal ordering, self-attribution, and reasoning-action consistency. Our experiments show that mirror-based self-identification emerges mainly in stronger VLMs. These models can use reflected evidence for action, whereas weaker…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
