MAGE: Meta-Reinforcement Learning for Language Agents toward Strategic Exploration and Exploitation

Lu Yang; Zelai Xu; Minyang Xie; Jiaxuan Gao; Zhao Shok; Yu Wang; Yi Wu

arXiv:2603.03680·cs.AI·March 5, 2026

MAGE: Meta-Reinforcement Learning for Language Agents toward Strategic Exploration and Exploitation

Lu Yang, Zelai Xu, Minyang Xie, Jiaxuan Gao, Zhao Shok, Yu Wang, Yi Wu

PDF

Open Access

TL;DR

MAGE introduces a meta-reinforcement learning framework for language agents that enhances their ability to strategically explore and exploit in multi-agent environments, leading to improved adaptability and generalization.

Contribution

The paper presents MAGE, a novel meta-RL approach for LLMs that incorporates multi-episode training, reflection, and population-based techniques to improve strategic exploration and exploitation.

Findings

01

MAGE outperforms baselines in exploration and exploitation tasks.

02

MAGE generalizes well to unseen opponents.

03

The framework enhances long-term adaptability of LLM agents.

Abstract

Large Language Model (LLM) agents have demonstrated remarkable proficiency in learned tasks, yet they often struggle to adapt to non-stationary environments with feedback. While In-Context Learning and external memory offer some flexibility, they fail to internalize the adaptive ability required for long-term improvement. Meta-Reinforcement Learning (meta-RL) provides an alternative by embedding the learning process directly within the model. However, existing meta-RL approaches for LLMs focus primarily on exploration in single-agent settings, neglecting the strategic exploitation necessary for multi-agent environments. We propose MAGE, a meta-RL framework that empowers LLM agents for strategic exploration and exploitation. MAGE utilizes a multi-episode training regime where interaction histories and reflections are integrated into the context window. By using the final episode reward…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Reinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning