LatentGeo: Learnable Auxiliary Constructions in Latent Space for Multimodal Geometric Reasoning
Haiying Xu, Zihan Wang, Song Dai, Zhengxuan Zhang, Kairan Dou, and Xuming Hu

TL;DR
LatentGeo introduces a novel framework that learns continuous latent visual representations to internalize auxiliary geometric constructions, enhancing multimodal geometric reasoning without relying on explicit rendering or external tools.
Contribution
It proposes LatentGeo, a three-stage curriculum and reinforcement learning method to internalize geometric constructions in latent space, improving reasoning accuracy in multimodal models.
Findings
Significant performance improvements on GeoAux and MathVerse benchmarks.
Effective internalization of auxiliary geometric constructions without pixel rendering.
Validation through extensive ablation studies.
Abstract
Despite recent advances in multimodal reasoning, representing auxiliary geometric constructions remains a fundamental challenge for multimodal large language models (MLLMs). Such constructions are absent from the original diagram and must be introduced before theorems apply. Existing approaches predominantly rely on explicit construction paradigms, including text-based geometric specification, visual-token interleaving during reasoning, and tool-augmented geometric execution. However, these methods either fail to faithfully represent complex spatial relationships, incur representation mismatch between discrete symbols and continuous geometric structures, or rely on external capabilities that hinder end-to-end optimization. To address these limitations, we propose LatentGeo, a framework that learns continuous latent visual representations to internalize auxiliary geometric constructions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Constraint Satisfaction and Optimization · Advanced Graph Neural Networks
