Concise Geometric Description as a Bridge: Unleashing the Potential of LLM for Plane Geometry Problem Solving
Jingyun Wang, Dian Li, Xiaohan Wang, Gang Liu, Jiahong Yan, Guoliang Kang

TL;DR
This paper introduces a method that converts visual geometric diagrams into textual descriptions using a trained interpreter, enabling large language models to solve plane geometry problems effectively without joint visual reasoning training.
Contribution
The authors propose a geometric description-based approach using CDL and a specialized interpreter, improving PGPS performance with limited data and avoiding joint visual reasoning training.
Findings
The method outperforms existing MLLMs on multiple datasets.
Using CDL matching rewards enhances CDL generation quality.
Training on only 5.5k data achieves competitive results.
Abstract
Plane Geometry Problem Solving (PGPS) is a multimodal reasoning task that aims to solve a plane geometric problem based on a geometric diagram and problem textual descriptions. Although Large Language Models (LLMs) possess strong reasoning skills, their direct application to PGPS is hindered by their inability to process visual diagrams. Existing works typically fine-tune Multimodal LLMs (MLLMs) end-to-end on large-scale PGPS data to enhance visual understanding and reasoning simultaneously. However, such joint optimization may compromise base LLMs' inherent reasoning capability. In this work, we observe that LLM itself is potentially a powerful PGPS solver when appropriately formulating visual information as textual descriptions. We propose to train a MLLM Interpreter to generate geometric descriptions for the visual diagram, and an off-the-shelf LLM is utilized to perform reasoning.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
