TL;DR
NeSyGeo introduces a neuro-symbolic framework for generating diverse geometric reasoning data, significantly enhancing multi-modal large language models' reasoning abilities through a new dataset and benchmark.
Contribution
The paper presents a novel neuro-symbolic framework with a domain-specific language and pipeline for high-quality geometric reasoning data generation, improving model performance.
Findings
Achieved up to +15.8% improvement on MathVision
Constructed 100k-sample datasets and a new benchmark
Enhanced MLLMs' geometric reasoning with limited data
Abstract
Obtaining large-scale, high-quality reasoning data is crucial for improving the geometric reasoning capabilities of multi-modal large language models (MLLMs). However, existing data generation methods, whether based on predefined tem plates or constrained symbolic provers, inevitably face diversity and numerical generalization limitations. To address these limitations, we propose NeSyGeo, a novel neuro-symbolic framework for generating geometric reasoning data. First, we propose a domain-specific language grounded in the entity-attributes-relations paradigm to comprehensively represent all components of plane geometry, along with generative actions defined within this symbolic space. We then design a symbolic-visual-text pipeline that synthesizes symbolic sequences, maps them to visual and textual representations and generates reasoning path with reverse search and forward validation.…
Peer Reviews
Decision·Submitted to ICLR 2026
1. The definition of the Geo-DSL is reasonable, which covered primitive entities, metric attributes and topological relations. 2. The problem generation process is reliable, which has backward search and forward verification and finally get the general language description of the CoT process. With the diagram painter to get the entire generated problem text, diagram and solution. 3. Based on the experiments, the generated dataset help model to deeper rely on diagram to solve the problem, differe
1. As a plane geometry problem generation work, the part of how to translate the DSL into a diagram is missing, how to get the location or coordinates of the points in the diagram? Meanwhile, is it possible to generate diagrams with only lines and no closed geometry shapes in the diagram, like a problem with two parallel lines and another line across them to ask about alternate angles. 2. The experiments for a dataset generation work are essential to show the effectiveness of the proposed datase
1.For the first time, differentiable neural search was combined with strict symbolic constraints, achieving a balance between diversity and correctness. 2.Geo-DSL can fully express plane geometry with only 37 primitives, and the interpretation is unambiguous. 3.The two-stage generation of reverse search + forward verification significantly reduces hallucinations, and the quality of manual evaluation is leading. 4.The orthogonal design of visual-text information forces the model to truly read
1. Geo-DSL only defines the basic elements of planar Euclidean geometry, lacking the symbolic primitives for coordinates, vectors, solid geometry and analytic geometry. This results in a significant disparity between the types of questions generated and those found in real exams and textbooks, especially in terms of comprehensive questions. 2.The lengths and angles are uniformly sampled within a fixed range, and the angles are forced to be multiples of 15°. This results in the frequencies of sp
1. The core problem identified and solved by authors is quite interesting as models can ignore images while solving geometry questions in the existing datasets because of their text-image redundancy. 2. The framework combines the strength of symbolic Geo-DSL and flexibility of neural models and the Reasoner-Verifier paradigm helps create diverse with correct reasoning paths. 3. Generated data is diverse and has a high quality and training is efficient-- via only 4k samples and two RL epocs, th
1. My only concern is while the method is quite limited to plane geometry only, it's complexity (multi-stage, API calls, etc.) is high and it heavily depends on powerful large-scale teacher models (e.g., DeepSeek and GPT).
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
