Evolutionary System Prompt Learning for Reinforcement Learning in LLMs
Lunjun Zhang, Ryan Chen, Bradly C. Stadie

TL;DR
This paper introduces E-SPL, a novel method combining reinforcement learning and evolutionary algorithms to jointly improve large language models' prompts and weights, enhancing reasoning and agentic task performance.
Contribution
E-SPL is the first approach to simultaneously evolve system prompts and update model weights using RL and genetic operators, leading to better generalization and efficiency.
Findings
E-SPL improves RL success rate from 38.8% to 45.1%.
E-SPL outperforms reflective prompt evolution methods.
Combining RL and prompt evolution yields consistent performance gains.
Abstract
Building agentic systems that can autonomously self-improve from experience is a longstanding goal of AI. Large language models (LLMs) today primarily self-improve via two mechanisms: self-reflection for context updates, and reinforcement learning (RL) for weight updates. In this work, we propose Evolutionary System Prompt Learning (E-SPL), a method for jointly improving model contexts and model weights. In each RL iteration, E-SPL samples trajectories under multiple system prompts in parallel, then jointly applies RL updates to LLM weights and evolutionary updates to system prompts. System prompts evolve via mutation and crossover, two genetic operators driven by LLM self-reflection; selection is based on relative performance ratings updated across RL iterations. E-SPL encourages a natural division between declarative knowledge encoded in prompts and procedural knowledge encoded in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Topic Modeling · Explainable Artificial Intelligence (XAI)
