Solving Inequality Proofs with Large Language Models
Pan Lu, Jiayi Sheng, Luna Lyu, Jikai Jin, Tony Xia, Alex Gu, James Zou

TL;DR
This paper investigates the capability of large language models to prove inequalities, introducing a new dataset, IneqMath, and an evaluation framework that reveals current models struggle with rigorous proofs, highlighting areas for future research.
Contribution
The authors present IneqMath, a curated dataset of Olympiad inequalities, and develop a novel LLM evaluation framework, exposing significant gaps in models' proof reasoning abilities.
Findings
Top models achieve less than 10% accuracy on proofs.
Accuracy drops up to 65.5% when considering reasoning steps.
Scaling models and computation offers limited improvements.
Abstract
Inequality proving, crucial across diverse scientific and mathematical fields, tests advanced reasoning skills such as discovering tight bounds and strategic theorem application. This makes it a distinct, demanding frontier for large language models (LLMs), offering insights beyond general mathematical problem-solving. Progress in this area is hampered by existing datasets that are often scarce, synthetic, or rigidly formal. We address this by proposing an informal yet verifiable task formulation, recasting inequality proving into two automatically checkable subtasks: bound estimation and relation prediction. Building on this, we release IneqMath, an expert-curated dataset of Olympiad-level inequalities, including a test set and training corpus enriched with step-wise solutions and theorem annotations. We also develop a novel LLM-as-judge evaluation framework, combining a final-answer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMachine Learning in Materials Science · Mathematics, Computing, and Information Processing · Topic Modeling
MethodsSparse Evolutionary Training
