MathConstruct: Challenging LLM Reasoning with Constructive Proofs
Mislav Balunovi\'c, Jasper Dekoninck, Nikola Jovanovi\'c, Ivo Petrov, Martin Vechev

TL;DR
MathConstruct is a new benchmark of 121 challenging math problems focusing on constructive proofs, designed to evaluate and improve the reasoning capabilities of Large Language Models beyond simple or memorized solutions.
Contribution
The paper introduces MathConstruct, a novel benchmark with automated verifiers and problem variations, addressing limitations of existing math benchmarks and emphasizing constructive proofs.
Findings
State-of-the-art LLMs solve only 60% of MathConstruct problems.
MathConstruct effectively evaluates LLM reasoning and robustness.
Problems are sourced from various math competitions and verified automatically.
Abstract
While Large Language Models (LLMs) demonstrate impressive performance in mathematics, existing math benchmarks come with significant limitations. Many focus on problems with fixed ground-truth answers, and are often saturated due to problem simplicity or the viability of guessing or memorization. Crucially, they capture only a narrow subset of relevant math problems. To address this research gap, we introduce MathConstruct, a new benchmark of 121 challenging problems sourced from various math competitions, which targets constructive proofs, a widely encountered problem type requiring the construction of mathematical objects with specific properties. These proofs are particularly suitable for LLM evaluation, as solution correctness can be easily verified. Our automated verifiers also enable MathConstruct to generate problem variations, used to evaluate robustness. State-of-the-art LLMs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMathematics, Computing, and Information Processing · Natural Language Processing Techniques
