TangramPuzzle: Evaluating Multimodal Large Language Models with Compositional Spatial Reasoning

Daixian Liu; Jiayi Kuang; Yinghui Li; Yangning Li; Di Yin; Haoyu Cao; Xing Sun; Ying Shen; Hai-Tao Zheng; Liang Lin; Philip S. Yu

arXiv:2601.16520·cs.CV·January 26, 2026

TangramPuzzle: Evaluating Multimodal Large Language Models with Compositional Spatial Reasoning

Daixian Liu, Jiayi Kuang, Yinghui Li, Yangning Li, Di Yin, Haoyu Cao, Xing Sun, Ying Shen, Hai-Tao Zheng, Liang Lin, Philip S. Yu

PDF

Open Access

TL;DR

This paper introduces TangramPuzzle, a new benchmark for evaluating multimodal large language models' ability to perform precise compositional spatial reasoning using a geometry-grounded approach.

Contribution

The paper presents TangramPuzzle and Tangram Construction Expression, enabling rigorous evaluation of spatial reasoning in MLLMs with novel tasks and a symbolic geometric framework.

Findings

01

MLLMs often prioritize silhouette matching over geometric constraints.

02

Models tend to produce distorted or deformed tangram pieces.

03

The benchmark reveals gaps in current MLLMs' spatial reasoning capabilities.

Abstract

Multimodal Large Language Models (MLLMs) have achieved remarkable progress in visual recognition and semantic understanding. Nevertheless, their ability to perform precise compositional spatial reasoning remains largely unexplored. Existing benchmarks often involve relatively simple tasks and rely on semantic approximations or coarse relative positioning, while their evaluation metrics are typically limited and lack rigorous mathematical formulations. To bridge this gap, we introduce TangramPuzzle, a geometry-grounded benchmark designed to evaluate compositional spatial reasoning through the lens of the classic Tangram game. We propose the Tangram Construction Expression (TCE), a symbolic geometric framework that grounds tangram assemblies in exact, machine-verifiable coordinate specifications, to mitigate the ambiguity of visual approximation. We design two complementary tasks: Outline…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Spatial Cognition and Navigation · Data Visualization and Analytics