Synthesizing Multimodal Geometry Datasets from Scratch and Enabling Visual Alignment via Plotting Code

Haobo Lin; Tianyi Bai; Chen Chen; Jiajun Zhang; Bohan Zeng; Wentao Zhang; Binhang Yuan

arXiv:2602.18745·cs.CV·February 24, 2026

Synthesizing Multimodal Geometry Datasets from Scratch and Enabling Visual Alignment via Plotting Code

Haobo Lin, Tianyi Bai, Chen Chen, Jiajun Zhang, Bohan Zeng, Wentao Zhang, Binhang Yuan

PDF

Open Access

TL;DR

This paper introduces GeoCode, a synthetic multimodal geometry dataset that enhances visual reasoning and alignment in models by combining symbolic problem generation, verification, and code-based diagram rendering.

Contribution

The paper presents a novel pipeline for creating complex geometry datasets from scratch and introduces code prediction as an explicit alignment objective for improved visual reasoning.

Findings

01

Models trained on GeoCode outperform existing benchmarks.

02

GeoCode exhibits higher structural complexity and reasoning difficulty.

03

The proposed alignment strategy improves model performance.

Abstract

Multimodal geometry reasoning requires models to jointly understand visual diagrams and perform structured symbolic inference, yet current vision--language models struggle with complex geometric constructions due to limited training data and weak visual--symbolic alignment. We propose a pipeline for synthesizing complex multimodal geometry problems from scratch and construct a dataset named \textbf{GeoCode}, which decouples problem generation into symbolic seed construction, grounded instantiation with verification, and code-based diagram rendering, ensuring consistency across structure, text, reasoning, and images. Leveraging the plotting code provided in GeoCode, we further introduce code prediction as an explicit alignment objective, transforming visual understanding into a supervised structured prediction task. GeoCode exhibits substantially higher structural complexity and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Visualization and Analytics · 3D Shape Modeling and Analysis · Handwritten Text Recognition Techniques