Reliable Use of Lemmas via Eligibility Reasoning and Section$-$Aware Reinforcement Learning
Zhikun Xu, Xiaodong Yu, Ben Zhou, Jiang Liu, Jialian Wu, Ze Wang, Ximeng Sun, Hao Chen, Zicheng Liu

TL;DR
This paper introduces RULES, a reinforcement learning framework that improves large language models' ability to correctly judge the usefulness of lemmas by formalizing the task and incorporating section-aware loss masking, leading to more robust lemma validation.
Contribution
The paper proposes a novel structured prediction approach with section-aware reinforcement learning to enhance lemma judgment accuracy in language models.
Findings
Consistent in-domain improvements over baseline models.
Enhanced robustness against perturbations that break applicability.
Maintained or slightly improved end-to-end task performance.
Abstract
Recent large language models (LLMs) perform strongly on mathematical benchmarks yet often misapply lemmas, importing conclusions without validating assumptions. We formalize lemmajudging as a structured prediction task: given a statement and a candidate lemma, the model must output a precondition check and a conclusionutility check, from which a usefulness decision is derived. We present RULES, which encodes this specification via a twosection output and trains with reinforcement learning plus sectionaware loss masking to assign penalty to the section responsible for errors. Training and evaluation draw on diverse natural language and formal proof corpora; robustness is assessed with a heldout perturbation suite; and endtoend evaluation spans competitionstyle, perturbationaligned, and theorembased problems across various LLMs. Results show consistent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning and Algorithms
