General learned delegation by clones

Darren Li; Meiqi Chen; Chenze Shao; Fandong Meng; Jie Zhou

arXiv:2602.13262·cs.AI·February 17, 2026

General learned delegation by clones

Darren Li, Meiqi Chen, Chenze Shao, Fandong Meng, Jie Zhou

PDF

Open Access

TL;DR

SELFCEST is a reinforcement learning method that enables language models to efficiently allocate computation across parallel clones, improving accuracy and cost-effectiveness on complex reasoning tasks.

Contribution

It introduces a learned controller for parallel model cloning and resource allocation, enhancing reasoning performance and generalization under fixed inference budgets.

Findings

01

Improves accuracy-cost tradeoff on math reasoning benchmarks

02

Enhances out-of-distribution generalization

03

Outperforms monolithic baselines at matched inference budgets

Abstract

Frontier language models improve with additional test-time computation, but serial reasoning or uncoordinated parallel sampling can be compute-inefficient under fixed inference budgets. We propose SELFCEST, which equips a base model with the ability to spawn same-weight clones in separate parallel contexts by agentic reinforcement learning. Training is end-to-end under a global task reward with shared-parameter rollouts, yielding a learned controller that allocates both generation and context budget across branches. Across challenging math reasoning benchmarks and long-context multi-hop QA, SELFCEST improves the accuracy-cost Pareto frontier relative to monolithic baselines at matched inference budget, and exhibits out-of-distribution generalization in both domains.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Natural Language Processing Techniques