Awakening the Sleeping Agent: Lean-Specific Agentic Data Reactivates General Tool Use in Goedel Prover

Jui-Hui Chung; Hongzhou Lin; Lai Jiang; Shange Tang; Chi Jin

arXiv:2604.08388·cs.AI·April 10, 2026

Awakening the Sleeping Agent: Lean-Specific Agentic Data Reactivates General Tool Use in Goedel Prover

Jui-Hui Chung, Hongzhou Lin, Lai Jiang, Shange Tang, Chi Jin

PDF

TL;DR

Fine-tuning on formal mathematics can suppress a model's tool use, but a small amount of targeted data can restore and transfer this capability effectively.

Contribution

Demonstrates that domain-specific fine-tuning suppresses but does not erase tool use, and small, targeted data can revive this ability across tasks.

Findings

01

Specialized fine-tuning reduces tool-calling accuracy from 89.4% to nearly 0%.

02

100 agentic traces restore tool use, improving performance significantly.

03

Recovered capabilities transfer across domains and improve in-domain performance.

Abstract

Heavy supervised fine-tuning on a target domain can strongly suppress capabilities that were present in the base model. We study this phenomenon in formal mathematics using Goedel-Prover-V2, an open-source model heavily trained on 1.8 million formal-math examples. After domain specialization, the model almost completely loses its ability to produce valid tool calls, even when explicitly instructed to use tools, dropping from 89.4% function-calling accuracy in the base model to nearly 0%. We ask whether this agentic collapse is permanent or instead reversible. To answer this question, we fine-tune the specialized model on a small amount of Lean-specific tool-use data. Remarkably, as few as 100 agentic traces are sufficient to restore strong tool-calling behavior. Importantly, this recovery is not the result of reward hacking or benchmark-specific optimization: the recovery data is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.