General learned delegation by clones
Darren Li, Meiqi Chen, Chenze Shao, Fandong Meng, Jie Zhou

TL;DR
SELFCEST is a reinforcement learning method that enables language models to efficiently allocate computation across parallel clones, improving accuracy and cost-effectiveness on complex reasoning tasks.
Contribution
It introduces a learned controller for parallel model cloning and resource allocation, enhancing reasoning performance and generalization under fixed inference budgets.
Findings
Improves accuracy-cost tradeoff on math reasoning benchmarks
Enhances out-of-distribution generalization
Outperforms monolithic baselines at matched inference budgets
Abstract
Frontier language models improve with additional test-time computation, but serial reasoning or uncoordinated parallel sampling can be compute-inefficient under fixed inference budgets. We propose SELFCEST, which equips a base model with the ability to spawn same-weight clones in separate parallel contexts by agentic reinforcement learning. Training is end-to-end under a global task reward with shared-parameter rollouts, yielding a learned controller that allocates both generation and context budget across branches. Across challenging math reasoning benchmarks and long-context multi-hop QA, SELFCEST improves the accuracy-cost Pareto frontier relative to monolithic baselines at matched inference budget, and exhibits out-of-distribution generalization in both domains.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Natural Language Processing Techniques
