MentalBlackboard: Evaluating Spatial Visualization via Mathematical Transformations

Nilay Yilmaz; Maitreya Patel; Naga Sai Abhiram Kusumba; Yixuan He; Yezhou Yang

arXiv:2602.19357·cs.CV·February 24, 2026

MentalBlackboard: Evaluating Spatial Visualization via Mathematical Transformations

Nilay Yilmaz, Maitreya Patel, Naga Sai Abhiram Kusumba, Yixuan He, Yezhou Yang

PDF

Open Access 1 Datasets

TL;DR

This paper introduces MentalBlackboard, a benchmark to evaluate Vision-Language Models' spatial visualization abilities through Paper Folding and Hole Punching tasks, revealing significant limitations in symmetry and rotation understanding.

Contribution

The paper presents MentalBlackboard, a novel benchmark for assessing spatial visualization in VLMs, highlighting their challenges in symmetry, rotation, and multi-stage spatial reasoning tasks.

Findings

01

Models struggle with symmetrical transformations.

02

Rotation tasks significantly challenge physical awareness.

03

Top models achieve only 10-25% accuracy on prediction and planning tasks.

Abstract

Spatial visualization is the mental ability to imagine, transform, and manipulate the spatial characteristics of objects and actions. This intelligence is a part of human cognition where actions and perception are connected on a mental level. To explore whether state-of-the-art Vision-Language Models (VLMs) exhibit this ability, we develop MentalBlackboard, an open-ended spatial visualization benchmark for Paper Folding and Hole Punching tests within two core tasks: prediction and planning. Our prediction experiments reveal that models struggle with applying symmetrical transformations, even when they predict the sequence of unfolding steps correctly. Also, rotations introduce a significant challenge to the physical situational awareness for models. The planning task reveals limitations of models in analyzing symmetrical relationships and in implementing the multi-stage symmetry…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

nlylmz/MentalBlackboard
dataset· 9.8k dl
9.8k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Visualization and Analytics · Spatial Cognition and Navigation · Multimodal Machine Learning Applications