Proving Olympiad Inequalities by Synergizing LLMs and Symbolic Reasoning

Zenan Li; Zhaoyu Li; Wen Tang; Xian Zhang; Yuan Yao; Xujie Si; Fan; Yang; Kaiyu Yang; Xiaoxing Ma

arXiv:2502.13834·cs.AI·February 28, 2025

Proving Olympiad Inequalities by Synergizing LLMs and Symbolic Reasoning

Zenan Li, Zhaoyu Li, Wen Tang, Xian Zhang, Yuan Yao, Xujie Si, Fan, Yang, Kaiyu Yang, Xiaoxing Ma

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper presents a neuro-symbolic approach combining large language models and symbolic methods to prove Olympiad inequalities, achieving state-of-the-art results without extra training data.

Contribution

It introduces a novel neuro-symbolic tactic generator that synergizes LLMs and symbolic methods specifically for Olympiad inequalities, improving proof efficiency and accuracy.

Findings

01

Achieved state-of-the-art performance on 161 Olympiad inequalities.

02

Significantly outperformed existing LLM and symbolic approaches.

03

Did not require additional training data.

Abstract

Large language models (LLMs) can prove mathematical theorems formally by generating proof steps (\textit{a.k.a.} tactics) within a proof system. However, the space of possible tactics is vast and complex, while the available training data for formal proofs is limited, posing a significant challenge to LLM-based tactic generation. To address this, we introduce a neuro-symbolic tactic generator that synergizes the mathematical intuition learned by LLMs with domain-specific insights encoded by symbolic methods. The key aspect of this integration is identifying which parts of mathematical reasoning are best suited to LLMs and which to symbolic methods. While the high-level idea of neuro-symbolic integration is broadly applicable to various mathematical problems, in this paper, we focus specifically on Olympiad inequalities (Figure~1). We analyze how humans solve these problems and distill…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 8Confidence 3

Strengths

+ Solving mathematical inequalities is a challenging, and significant task. It also has application to formal verification. + The proposed approach is a nice integration of symbolic and neural methods. The criterion to distinguish between their use in tactic application is clear: reasoning in a finite search space is conducted symbolically and reasoning in an infinite space is neurally. The idea of symbolic filtering and neural scoring is natural, but its significance has been (partly) proven i

Weaknesses

- I don't think this is a weakness of the current form of the work, but the paper could enhance the work and broaden the audience if it can claim there are other math domains with two types of tactics such that one is suitable for symbolic reasoning (or the search space of its application is finite) and the other is for neural reasoning (or its search space is infinite), like scaling and rewriting in inequality problems. - Some parts of the proposed approach are unclear (see the following quest

Reviewer 02Rating 5Confidence 4

Strengths

The work demonstrates the potential of combining machine learning and symbolic/formal techniques, which is an interesting and still under-explored direction. The design of LIPS looks well thought out, leveraging appropriate symbolic techniques in the pipeline. The authors compared LIPS against a representative variety of baselines, both symbolic and ML-based. The paper is mostly written clearly, and provides enough background information and helpful examples (but some parts are less clear --

Weaknesses

The test dataset is not publicly available. In particular, the Lean and SMT formalizations are not available. It is relatively easily to introduce unintended errors when formalizing math, so open-sourcing the formalizations is important to make inspection possible. Also, it is crucial to allow other researchers to compare their approaches by providing the test data. The implementation of LIPS and MCTS is not available for testing and inspection, and there is no promise of open sourcing it. So

Reviewer 03Rating 8Confidence 3

Strengths

The paper takes a novel approach (to my knowledge) to solve olympiad inequalities following the recent paradigm shift of using test-time compute to solve mathematical problems. The domain specificity of solving inequality problems is very well exploited. The method rigorously categorizes proof tactics into scaling and rewriting, with each category handled by techniques suited to its complexity. This results in successful results as demonstrated by extensive experiments and results in proofs writ

Weaknesses

It is unclear to me how LLM rewritings are formalized in Lean 4 - is it a Lean 4 tactic that can do the rewrites of LLM? If that is the case, it would be interesting to use this as a tactic in general theorem proving in Lean 4. The datasets used (e.g., ChenNEQ, MO-INT, 567NEQ) focus on specific competition-level inequalities, which might limit generalizability. I don't understand why authors took a random sample of 100 from 567NEQ instead of running the system on all of them. I think it is impo

Code & Models

Repositories

lizn-zn/neqlips
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning

MethodsFocus