Ratchet: A Minimal Hygiene Recipe for Self-Evolving LLM Agents

Xing Zhang; Yanwei Cui; Guanghui Wang; Ziyuan Li; Wei Qiu; Bing Zhu; Peiyang He

arXiv:2605.22148·cs.AI·May 22, 2026

Ratchet: A Minimal Hygiene Recipe for Self-Evolving LLM Agents

Xing Zhang, Yanwei Cui, Guanghui Wang, Ziyuan Li, Wei Qiu, Bing Zhu, Peiyang He

PDF

TL;DR

Ratchet is a minimal, self-contained loop enabling LLM agents to autonomously write, curate, and retire skills, significantly improving their performance without weight updates.

Contribution

It introduces Ratchet, a simple yet effective hygiene mechanism for self-evolving LLM agents, combining retirement, active-cap, meta-guidance, and pattern canonicalisation.

Findings

01

Ratchet improves pass@1 from 0.258 to 0.584 on MBPP+ hard-100.

02

The recipe transfers successfully to an agentic solver on SWE-bench.

03

Ablation studies show retirement and meta-skill guidance are essential.

Abstract

Self-evolving skill libraries, pioneered by Voyager, let frozen LLM agents accumulate reusable knowledge without weight updates, yet recent evaluation shows that LLM-authored skills deliver $+ 0.0$ pp over no-skill baselines while human-curated ones deliver $+ 16.2$ pp: the bottleneck is not skill authoring but lifecycle management. We introduce \textbf{Ratchet}, a single-agent loop in which a frozen LLM writes, retrieves, curates, and retires its own natural-language skills. Ratchet integrates four candidate hygiene mechanisms: outcome-driven retirement, a bounded active-cap, meta-skill authoring guidance, and pattern canonicalisation. On MBPP+ hard-100 with Claude Opus 4.7, Ratchet lifts held-out pass@1 from a $0.258 \pm 0.047$ baseline to a late-window rolling mean of $0.584$ (peak $0.658 \pm 0.042$ ) across 100 rounds and 3 seeds, a $+ 0.328 \pm 0.018$ rolling-mean gain where the no-skill…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.