TL;DR
This paper introduces MARS-GPS, a multi-chain reasoning approach with voting and verification, significantly improving geometric reasoning accuracy in large language models.
Contribution
It proposes a novel multi-chain-of-thought voting method with parallel rollouts and code verification, advancing logical inference in geometric problem solving.
Findings
Achieves 88.8% accuracy on Geometry3K, surpassing previous state-of-the-art.
Accuracy improves with more rollouts, up to 16, with a +6.0% gain.
Multi-chain reasoning with voting enhances logical inference in LLMs.
Abstract
Geometric Problem Solving (GPS) remains at the heart of enhancing mathematical reasoning in large language models because it requires the combination of diagrammatic understanding, symbolic manipulation and logical inference. In existing literature, researchers have chiefly focused on synchronising the diagram descriptions with text literals and solving the problem. In this vein, they have either taken a neural, symbolic or neuro-symbolic approach. But this solves only the first two of the requirements, namely diagrammatic understanding and symbolic manipulation, while leaving logical inference underdeveloped. The logical inference is often limited to one chain-of-thought (CoT). To address this weakness in hitherto existing models, this paper proposes MARS-GPS, that generates multiple parallel reasoning rollouts augmented with Python code execution for numerical verification, ranks them…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
