CombiBench: Benchmarking LLM Capability for Combinatorial Mathematics
Junqi Liu, Xiaohan Lin, Jonas Bayer, Yael Dillies, Weijie Jiang,, Xiaodan Liang, Roman Soletskyi, Haiming Wang, Yunzhou Xie, Beibei Xiong,, Zhengfeng Yang, Jujian Zhang, Lihong Zhi, Jia Li, Zhengying Liu

TL;DR
CombiBench is a new benchmark for evaluating large language models on combinatorial mathematics problems, featuring formalized problems in Lean4 and a standardized evaluation framework, revealing current models' limited capabilities in this domain.
Contribution
The paper introduces CombiBench, a comprehensive combinatorial problem benchmark with formalization in Lean4 and a novel evaluation framework, filling a gap in formal mathematics benchmarking.
Findings
Kimina-Prover solved 7 out of 100 problems.
LLMs show limited capability in formal combinatorial problem solving.
Benchmark and evaluation tools are open-sourced for future research.
Abstract
Neurosymbolic approaches integrating large language models with formal reasoning have recently achieved human-level performance on mathematics competition problems in algebra, geometry and number theory. In comparison, combinatorics remains a challenging domain, characterized by a lack of appropriate benchmarks and theorem libraries. To address this gap, we introduce CombiBench, a comprehensive benchmark comprising 100 combinatorial problems, each formalized in Lean~4 and paired with its corresponding informal statement. The problem set covers a wide spectrum of difficulty levels, ranging from middle school to IMO and university level, and span over ten combinatorial topics. CombiBench is suitable for testing IMO solving capabilities since it includes all IMO combinatorial problems since 2000 (except IMO 2004 P3 as its statement contain an images). Furthermore, we provide a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMathematics, Computing, and Information Processing
MethodsSparse Evolutionary Training
