GeoTikzBridge: Advancing Multimodal Code Generation for Geometric Perception and Reasoning

Jiayin Sun; Caixia Sun; Boyu Yang; Hailin Li; Xiao Chen; Yi Zhang; Errui Ding; Liang Li; Chao Deng; Junlan Feng

arXiv:2603.22687·cs.CV·March 30, 2026

GeoTikzBridge: Advancing Multimodal Code Generation for Geometric Perception and Reasoning

Jiayin Sun, Caixia Sun, Boyu Yang, Hailin Li, Xiao Chen, Yi Zhang, Errui Ding, Liang Li, Chao Deng, Junlan Feng

PDF

1 Repo

TL;DR

GeoTikzBridge introduces a framework that significantly improves geometric perception and reasoning in multimodal large language models by generating tikz-based code, supported by large datasets and achieving state-of-the-art results.

Contribution

The paper presents the first instruction-augmented tikz dataset and models that enhance geometric understanding and reasoning in multimodal large language models.

Findings

01

Models achieve state-of-the-art performance among open-sourced MLLMs.

02

GeoTikzBridge models serve as plug-and-play modules for geometric reasoning.

03

The datasets are the largest of their kind, supporting extensive geometric perception.

Abstract

Multimodal Large Language Models (MLLMs) have recently demonstrated remarkable perceptual and reasoning abilities. However, they struggle to perceive fine-grained geometric structures, constraining their ability of geometric understanding and visual reasoning. To address this, we propose GeoTikzBridge, a framework that enhances local geometric perception and visual reasoning through tikz-based code generation. Within this framework, we build two models supported by two complementary datasets. The GeoTikzBridge-Base model is trained on GeoTikz-Base dataset, the largest image-to-tikz dataset to date with 2.5M pairs (16 $\times$ larger than existing open-sourced datasets). This process is achieved via iterative data expansion and a localized geometric transformation strategy. Subsequently, GeoTikzBridge-Instruct is fine-tuned on GeoTikz-Instruct dataset which is the first…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sjy-1995/GeoTikzBridge
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.