CAM-Bench: A Benchmark for Computational and Applied Mathematics in Lean
Wentao Long, Yunfei Zhang, Chenyi Li, Li Zhou, Chumin Sun, Zaiwen Wen

TL;DR
CAM-Bench is a new benchmark with 1,000 formalized problems in computational and applied mathematics for evaluating large language models' mathematical reasoning in Lean.
Contribution
It introduces a novel benchmark covering applied math problems, with a dependency-recovery pipeline and validation process, expanding beyond existing Olympiad-focused benchmarks.
Findings
Evaluated large language models on CAM-Bench, identifying common failure modes.
Developed a dependency-recovery pipeline for formalizing textbook exercises.
Validated problems through Lean compilation and semantic review.
Abstract
Formal theorem-proving benchmarks enable mechanically verifiable evaluation of mathematical reasoning in large language models. However, existing benchmarks mainly focus on Olympiad-style problems and algebraic domains, leaving computational and applied mathematics underrepresented. We introduce CAM-Bench, a Lean 4 theorem-proving benchmark of 1,000 Lean proof targets in computational and applied mathematics, with coverage spanning optimization, numerical linear algebra, and numerical analysis. These problems are adapted from textbook exercises and often depend on locally introduced definitions, notation, algorithms, and elementary results. To construct CAM-Bench, we develop a dependency-recovery pipeline that reconstructs the local textbook context needed to state each problem faithfully. It then normalizes each problem into a standalone informal theorem and translates it into a Lean…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
