Iterative Deployment Improves Planning Skills in LLMs

Augusto B. Corr\^ea; Yoav Gelberg; Luckeciano C. Melo; Ilia Shumailov; Andr\'e G. Pereira; Yarin Gal

arXiv:2512.24940·cs.AI·January 1, 2026

Iterative Deployment Improves Planning Skills in LLMs

Augusto B. Corr\^ea, Yoav Gelberg, Luckeciano C. Melo, Ilia Shumailov, Andr\'e G. Pereira, Yarin Gal

PDF

Open Access

TL;DR

Iterative deployment of LLMs, involving user-curated data, enhances planning skills and induces emergent generalization, functioning similarly to reinforcement learning without explicit reward signals.

Contribution

This paper introduces a novel iterative deployment mechanism that improves LLM planning abilities and provides a theoretical link to reinforcement learning, highlighting safety and training implications.

Findings

01

Models show significant planning skill improvements after iterative deployment.

02

Later models discover longer, more complex plans than initial models.

03

The mechanism acts as an implicit reinforcement learning process.

Abstract

We show that iterative deployment of large language models (LLMs), each fine-tuned on data carefully curated by users from the previous models' deployment, can significantly change the properties of the resultant models. By testing this mechanism on various planning domains, we observe substantial improvements in planning skills, with later models displaying emergent generalization by discovering much longer plans than the initial models. We then provide theoretical analysis showing that iterative deployment effectively implements reinforcement learning (RL) training in the outer-loop (i.e. not as part of intentional model training), with an implicit reward function. The connection to RL has two important implications: first, for the field of AI safety, as the reward function entailed by repeated deployment is not defined explicitly, and could have unexpected implications to the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Reinforcement Learning in Robotics · AI-based Problem Solving and Planning