Synthesizing Multimodal Geometry Datasets from Scratch and Enabling Visual Alignment via Plotting Code
Haobo Lin, Tianyi Bai, Chen Chen, Jiajun Zhang, Bohan Zeng, Wentao Zhang, Binhang Yuan

TL;DR
This paper introduces GeoCode, a synthetic multimodal geometry dataset that enhances visual reasoning and alignment in models by combining symbolic problem generation, verification, and code-based diagram rendering.
Contribution
The paper presents a novel pipeline for creating complex geometry datasets from scratch and introduces code prediction as an explicit alignment objective for improved visual reasoning.
Findings
Models trained on GeoCode outperform existing benchmarks.
GeoCode exhibits higher structural complexity and reasoning difficulty.
The proposed alignment strategy improves model performance.
Abstract
Multimodal geometry reasoning requires models to jointly understand visual diagrams and perform structured symbolic inference, yet current vision--language models struggle with complex geometric constructions due to limited training data and weak visual--symbolic alignment. We propose a pipeline for synthesizing complex multimodal geometry problems from scratch and construct a dataset named \textbf{GeoCode}, which decouples problem generation into symbolic seed construction, grounded instantiation with verification, and code-based diagram rendering, ensuring consistency across structure, text, reasoning, and images. Leveraging the plotting code provided in GeoCode, we further introduce code prediction as an explicit alignment objective, transforming visual understanding into a supervised structured prediction task. GeoCode exhibits substantially higher structural complexity and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Visualization and Analytics · 3D Shape Modeling and Analysis · Handwritten Text Recognition Techniques
