MAGE: Meta-Reinforcement Learning for Language Agents toward Strategic Exploration and Exploitation
Lu Yang, Zelai Xu, Minyang Xie, Jiaxuan Gao, Zhao Shok, Yu Wang, Yi Wu

TL;DR
MAGE introduces a meta-reinforcement learning framework for language agents that enhances their ability to strategically explore and exploit in multi-agent environments, leading to improved adaptability and generalization.
Contribution
The paper presents MAGE, a novel meta-RL approach for LLMs that incorporates multi-episode training, reflection, and population-based techniques to improve strategic exploration and exploitation.
Findings
MAGE outperforms baselines in exploration and exploitation tasks.
MAGE generalizes well to unseen opponents.
The framework enhances long-term adaptability of LLM agents.
Abstract
Large Language Model (LLM) agents have demonstrated remarkable proficiency in learned tasks, yet they often struggle to adapt to non-stationary environments with feedback. While In-Context Learning and external memory offer some flexibility, they fail to internalize the adaptive ability required for long-term improvement. Meta-Reinforcement Learning (meta-RL) provides an alternative by embedding the learning process directly within the model. However, existing meta-RL approaches for LLMs focus primarily on exploration in single-agent settings, neglecting the strategic exploitation necessary for multi-agent environments. We propose MAGE, a meta-RL framework that empowers LLM agents for strategic exploration and exploitation. MAGE utilizes a multi-episode training regime where interaction histories and reflections are integrated into the context window. By using the final episode reward…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Reinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning
