FORGE: Self-Evolving Agent Memory With No Weight Updates via Population Broadcast
Igor Bogdanov, Chung-Horng Lung, Thomas Kunz, Jie Gao, Adrian Taylor, Marzia Zaman

TL;DR
FORGE is a population-based protocol that enables LLM agents to improve decision-making by self-generating and evolving memory artifacts without weight updates, significantly enhancing performance in a network-defense task.
Contribution
The paper introduces FORGE, a novel method for evolving LLM agent memory through population broadcast and staged learning, eliminating the need for gradient updates.
Findings
FORGE improves average returns by up to 7.7 times over zero-shot baselines.
Population broadcast is essential for performance gains.
Examples-based memory yields the strongest results for most models.
Abstract
Can LLM agents improve decision-making through self-generated memory without gradient updates? We propose FORGE (Failure-Optimized Reflective Graduation and Evolution), a staged, population-based protocol that evolves prompt-injected natural-language memory for hierarchical ReAct agents. FORGE wraps a Reflexion-style inner loop, where a dedicated reflection agent (using the same underlying LLM, no distillation from a stronger model) converts failed trajectories into reusable knowledge artifacts: textual heuristics (Rules), few-shot demonstrations (Examples), or both (Mixed), with an outer loop that propagates the best-performing instance's memory to the population between stages and freezes converged instances via a graduation criterion. We evaluate on CybORG CAGE-2, a stochastic network-defense POMDP at a 30-step horizon against the B-line attacker, where all four tested LLM families…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
