From AI Assistant to AI Scientist: Autonomous Discovery of LLM-RL Algorithms with LLM Agents
Sirui Xia, Yikai Zhang, Aili Chen, Siye Wu, Siyu Yuan, Yanghua Xiao

TL;DR
This paper introduces POISE, a framework that automates the discovery of policy optimization algorithms for language models, leading to improved performance and interpretable mechanisms through an evidence-driven, closed-loop process.
Contribution
POISE is a novel automated framework that searches over algorithmic mechanisms for language model training, enabling discovery of improved policies with interpretable design principles.
Findings
Discovered mechanisms like analytic-variance scaling and validity masking.
Achieved a +4.6 increase in weighted Overall performance.
Improved AIME25 pass@32 from 26.7% to 43.3%.
Abstract
Discovering improved policy optimization algorithms for language models remains a costly manual process requiring repeated mechanism-level modification and validation. Unlike simple combinatorial code search, this problem requires searching over algorithmic mechanisms tightly coupled with training dynamics while reusing empirical evidence across iterations. We propose POISE, a closed-loop framework for automated discovery of policy optimization algorithms for language models. POISE maintains a structured, genealogically linked archive linking proposals, executable implementations, standardized evaluations, and natural-language reflections to support evidence-driven iteration. In mathematical reasoning experiments starting from GRPO, POISE evaluates 64 candidate algorithms and discovers improved mechanisms, including analytic-variance scaling and validity masking. The best variant…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Machine Learning in Materials Science · Topic Modeling
