Diagram Formalization Enhanced Multi-Modal Geometry Problem Solver
Zeren Zhang, Jo-Ku Cheng, Jingyang Deng, Lu Tian, Jinwen Ma, Ziran, Qin, Xiaokai Zhang, Na Zhu, Tuo Leng

TL;DR
This paper introduces DFE-GPS, a framework that combines visual, formal, and natural language representations to improve AI's understanding and solving of geometry problems involving diagrams.
Contribution
It presents a novel synthetic dataset and a formalization method to enhance multi-modal models' ability to interpret geometric diagrams.
Findings
Improved performance on geometric problem solving tasks.
Enhanced understanding of geometric diagrams by MLLMs.
Effective integration of visual and formal language representations.
Abstract
Mathematical reasoning remains an ongoing challenge for AI models, especially for geometry problems that require both linguistic and visual signals. As the vision encoders of most MLLMs are trained on natural scenes, they often struggle to understand geometric diagrams, performing no better in geometry problem solving than LLMs that only process text. This limitation is amplified by the lack of effective methods for representing geometric relationships. To address these issues, we introduce the Diagram Formalization Enhanced Geometry Problem Solver (DFE-GPS), a new framework that integrates visual features, geometric formal language, and natural language representations. We propose a novel synthetic data approach and create a large-scale geometric dataset, SynthGeo228K, annotated with both formal and natural language captions, designed to enhance the vision encoder for a better…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsManufacturing Process and Optimization · Advanced Numerical Analysis Techniques · Computational Geometry and Mesh Generation
