LeanCat: A Benchmark Suite for Formal Category Theory in Lean (Part I: 1-Categories)
Rongge Xu, Hui Dai, Yiming Fu, Jiedong Jiang, Tianjiao Nie, Junkai Wang, Holiverse Yang, Zhi-Hao Zhang

TL;DR
This paper introduces LeanCat, a challenging benchmark suite for formal category theory in Lean, revealing significant gaps in current models' abstraction capabilities and demonstrating the effectiveness of retrieval-augmented reasoning methods.
Contribution
The paper presents LeanCat, a new benchmark for formal category theory tasks, and evaluates retrieval-augmented models, showing improved performance and highlighting the importance of iterative reasoning.
Findings
Current models solve only 12% of tasks at pass@4.
Performance drops from 55% on Easy to 0% on High-difficulty tasks.
Retrieval-augmented LeanBridge doubles success rate to 24%.
Abstract
While large language models (LLMs) have demonstrated impressive capabilities in formal theorem proving, current benchmarks fail to adequately measure library-grounded abstraction -- the ability to reason with high-level interfaces and reusable structures central to modern mathematics and software engineering. We introduce LeanCat, a challenging benchmark comprising 100 fully formalized category-theory tasks in Lean. Unlike algebra or arithmetic, category theory serves as a rigorous stress test for structural, interface-level reasoning. Our evaluation reveals a severe abstraction gap: the best state-of-the-art model solves only 12.0% of tasks at pass@4, with performance collapsing from 55.0% on Easy tasks to 0.0% on High-difficulty tasks, highlighting a failure in compositional generalization. To overcome this, we evaluate LeanBridge, a retrieval-augmented agent that employs a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Mathematics, Computing, and Information Processing · Model-Driven Software Engineering Techniques
