DynaSolidGeo: A Dynamic Benchmark for Genuine Spatial Mathematical Reasoning of VLMs in Solid Geometry

Changti Wu; Shijie Lian; Zihao Liu; Lei Zhang; Laurence Tianruo Yang; Kai Chen

arXiv:2510.22340·cs.AI·November 12, 2025

DynaSolidGeo: A Dynamic Benchmark for Genuine Spatial Mathematical Reasoning of VLMs in Solid Geometry

Changti Wu, Shijie Lian, Zihao Liu, Lei Zhang, Laurence Tianruo Yang, Kai Chen

PDF

1 Datasets

TL;DR

DynaSolidGeo is a novel dynamic benchmark designed to evaluate genuine spatial reasoning in Vision-Language Models for solid geometry, addressing limitations of static datasets and answer-only evaluation methods.

Contribution

It introduces the first dynamic, semi-automatically generated benchmark with process evaluation for spatial reasoning in VLMs, enabling more comprehensive assessment.

Findings

01

Large performance gaps across models

02

Severe degradation in dynamic settings

03

Poor performance on high-level spatial tasks

Abstract

Solid geometry problem solving demands spatial mathematical reasoning that integrates spatial intelligence and symbolic reasoning. However, most existing multimodal mathematical reasoning benchmarks focus primarily on 2D plane geometry, rely on static datasets prone to data contamination and memorization, and evaluate models solely by final answers, overlooking the reasoning process. To address these limitations, we introduce DynaSolidGeo, the first dynamic benchmark for evaluating genuine spatial reasoning in Vision-Language Models (VLMs). Constructed through a semi-automatic annotation pipeline, DynaSolidGeo contains 503 expert-curated seed questions that can, in principle, dynamically generate an unbounded number of diverse multimodal text-visual instances. Beyond answer accuracy, we incorporate process evaluation based on expert-annotated reasoning chains to measure logical validity…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

MaplesWCT/DynaSolidGeo-Sample
dataset· 16 dl
16 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.