Bourbaki: Self-Generated and Goal-Conditioned MDPs for Theorem Proving
Matthieu Zimmer, Xiaotong Ji, Rasul Tutunov, Anthony Bordg, Jun Wang, Haitham Bou Ammar

TL;DR
This paper introduces self-generated goal-conditioned MDPs for theorem proving, enabling structured subgoal pursuit with LLMs, and demonstrates state-of-the-art results on PutnamBench using a modular system called Bourbaki.
Contribution
It proposes a novel sG-MDP framework for theorem proving, integrating MCTS and ensemble LLMs, and achieves new performance benchmarks.
Findings
Bourbaki (7B) solves 26 PutnamBench problems.
The sG-MDP framework improves search efficiency in theorem proving.
State-of-the-art results achieved with modular ensemble LLMs.
Abstract
Reasoning remains a challenging task for large language models (LLMs), especially within the logically constrained environment of automated theorem proving (ATP), due to sparse rewards and the vast scale of proofs. These challenges are amplified in benchmarks like PutnamBench, which contains university-level problems requiring complex, multi-step reasoning. To address this, we introduce self-generated goal-conditioned MDPs (sG-MDPs), a new framework in which agents generate and pursue their subgoals based on the evolving proof state. Given this more structured generation of goals, the resulting problem becomes more amenable to search. We then apply Monte Carlo Tree Search (MCTS)-like algorithms to solve the sG-MDP, instantiating our approach in Bourbaki (7B), a modular system that can ensemble multiple 7B LLMs for subgoal generation and tactic synthesis. On PutnamBench, Bourbaki (7B)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
