GamiBench: Evaluating Spatial Reasoning and 2D-to-3D Planning Capabilities of MLLMs with Origami Folding Tasks
Ryan Spencer, Roey Yaari, Ritvik Vemavarapu, Joyce Yang, Steven Ngo, Utkarsh Sharma

TL;DR
GamiBench is a comprehensive benchmark for evaluating multimodal large language models' spatial reasoning and 2D-to-3D planning abilities using origami folding tasks, highlighting current model limitations.
Contribution
Introduces GamiBench, a novel benchmark with new metrics for assessing spatial reasoning and 2D-to-3D planning in MLLMs through origami-inspired tasks.
Findings
Leading models like GPT-5 and Gemini-2.5-Pro struggle with spatial understanding.
GamiBench measures cross-view consistency and physical feasibility.
The benchmark provides a standardized framework for geometric reasoning evaluation.
Abstract
Multimodal large language models (MLLMs) are proficient in perception and instruction-following, but they still struggle with spatial reasoning: the ability to mentally track and manipulate objects across multiple views and over time. Spatial reasoning is a key component of human intelligence, but most existing benchmarks focus on static images or final outputs, failing to account for the sequential and viewpoint-dependent nature of this skill. To close this gap, we introduce GamiBench, a benchmark designed to evaluate spatial reasoning and 2D-to-3D planning in MLLMs through origami-inspired folding tasks. GamiBench includes 186 regular and 186 impossible 2D crease patterns paired with their corresponding 3D folded shapes, produced from six distinct viewpoints across three visual question-answering (VQA) tasks: predicting 3D fold configurations, distinguishing valid viewpoints, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Artificial Intelligence in Games · Spatial Cognition and Navigation
