TL;DR
GeometryZero introduces a novel reinforcement learning framework, GCPO, that enhances geometry reasoning models by effectively utilizing auxiliary constructions, leading to improved performance on geometry problem datasets.
Contribution
The paper presents Group Contrastive Policy Optimization (GCPO), a new RL method that improves geometry reasoning by better rewarding useful auxiliary constructions, enabling smaller models to perform competitively.
Findings
GeometryZero outperforms RL baselines on Geometry3K and MathVista datasets.
GCPO effectively distinguishes useful from harmful auxiliary constructions.
Using auxiliary construction improves geometry problem-solving accuracy.
Abstract
Recent progress in large language models (LLMs) has boosted mathematical reasoning, yet geometry remains challenging where auxiliary construction is often essential. Prior methods either underperform or depend on very large models (e.g., GPT-4o), making them costly. We argue that reinforcement learning with verifiable rewards (e.g., GRPO) can train smaller models to couple auxiliary construction with solid geometric reasoning. However, naively applying GRPO yields unconditional rewards, encouraging indiscriminate and sometimes harmful constructions. We propose Group Contrastive Policy Optimization (GCPO), an RL framework with two components: (1) Group Contrastive Masking, which assigns positive/negative construction rewards based on contextual utility, and (2) a Length Reward that encourages longer reasoning chains. On top of GCPO, we build GeometryZero, an affordable family of geometry…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
