MentalBlackboard: Evaluating Spatial Visualization via Mathematical Transformations
Nilay Yilmaz, Maitreya Patel, Naga Sai Abhiram Kusumba, Yixuan He, Yezhou Yang

TL;DR
This paper introduces MentalBlackboard, a benchmark to evaluate Vision-Language Models' spatial visualization abilities through Paper Folding and Hole Punching tasks, revealing significant limitations in symmetry and rotation understanding.
Contribution
The paper presents MentalBlackboard, a novel benchmark for assessing spatial visualization in VLMs, highlighting their challenges in symmetry, rotation, and multi-stage spatial reasoning tasks.
Findings
Models struggle with symmetrical transformations.
Rotation tasks significantly challenge physical awareness.
Top models achieve only 10-25% accuracy on prediction and planning tasks.
Abstract
Spatial visualization is the mental ability to imagine, transform, and manipulate the spatial characteristics of objects and actions. This intelligence is a part of human cognition where actions and perception are connected on a mental level. To explore whether state-of-the-art Vision-Language Models (VLMs) exhibit this ability, we develop MentalBlackboard, an open-ended spatial visualization benchmark for Paper Folding and Hole Punching tests within two core tasks: prediction and planning. Our prediction experiments reveal that models struggle with applying symmetrical transformations, even when they predict the sequence of unfolding steps correctly. Also, rotations introduce a significant challenge to the physical situational awareness for models. The planning task reveals limitations of models in analyzing symmetrical relationships and in implementing the multi-stage symmetry…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Visualization and Analytics · Spatial Cognition and Navigation · Multimodal Machine Learning Applications
