Inference-Time Diversity in RL-Trained Lean Theorem Provers: A Diagnostic Study

Zachary Burton

arXiv:2601.16172·cs.AI·May 19, 2026

Inference-Time Diversity in RL-Trained Lean Theorem Provers: A Diagnostic Study

Zachary Burton

PDF

TL;DR

This study investigates inference-time diversity in RL-trained Lean theorem provers, revealing that strategic prompt variations significantly improve theorem-solving performance and are specific to RL training stages.

Contribution

It introduces a diagnostic approach to understanding inference-time diversity effects, demonstrating that prompt diversity can mitigate mode-collapse in RL-trained provers.

Findings

01

Doubling sampling budget yields no additional solved theorems.

02

A fixed schedule of tactic skeletons improves theorem-solving by 45%.

03

Diversity interventions have RL-specific effects, not seen in SFT-trained models.

Abstract

RL-trained Lean theorem provers mode-collapse at inference time: on miniF2F-test with DeepSeek-Prover-V1.5-RL, doubling the i.i.d.\ sampling budget from $k = 32$ to $k = 64$ produces zero additional solved theorems (42/244 in both cases). A fixed schedule of 15 tactic skeletons breaks this plateau and recovers a $+ 45$ relative improvement at $k = 16$ (mean $Δ = + 12.3 \pm 4.2$ theorems across $n = 3$ seeds, sign preserved in every seed). A controlled diversity ablation rules out the prompt-diversity confound: tactic skeletons help, paraphrases match the baseline, and irrelevant Lean comments actively degrade. A leave-one-out formalization-difficulty stratification reveals a structural-content gradient across the three perturbations. The phenomenon is RL-specific: V1.5-Base proves zero theorems regardless of intervention, identifying RL as the stage that creates the proof capability…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning and Data Classification