MEAL: A Benchmark for Continual Multi-Agent Reinforcement Learning

Tristan Tomilin; Luka van den Boogaard; Samuel Garcin; Bram Grooten; Meng Fang; Yali Du; Mykola Pechenizkiy

arXiv:2506.14990·cs.AI·September 9, 2025

MEAL: A Benchmark for Continual Multi-Agent Reinforcement Learning

Tristan Tomilin, Luka van den Boogaard, Samuel Garcin, Bram Grooten, Meng Fang, Yali Du, Mykola Pechenizkiy

PDF

Open Access 3 Reviews

TL;DR

MEAL is a new benchmark designed for continual multi-agent reinforcement learning, enabling efficient GPU-accelerated evaluation of algorithms across long task sequences, highlighting challenges in scalability and coordination.

Contribution

We introduce MEAL, the first GPU-accelerated benchmark for CMARL, facilitating scalable evaluation and analysis of continual learning in multi-agent environments.

Findings

01

Naive combinations of CL and MARL methods perform well on simple tasks.

02

Performance drops on complex environments requiring coordination.

03

Architectural and algorithmic features are critical for success.

Abstract

Benchmarks play a crucial role in the development and analysis of reinforcement learning (RL) algorithms, with environment availability strongly impacting research. One particularly underexplored intersection is continual learning (CL) in cooperative multi-agent settings. To remedy this, we introduce MEAL (Multi-agent Environments for Adaptive Learning), the first benchmark tailored for continual multi-agent reinforcement learning (CMARL). Existing CL benchmarks run environments on the CPU, leading to computational bottlenecks and limiting the length of task sequences. MEAL leverages JAX for GPU acceleration, enabling continual learning across sequences of 100 tasks on a standard desktop PC in a few hours. We show that naively combining popular CL and MARL methods yields strong performance on simple environments, but fails to scale to more complex settings requiring sustained…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 4Confidence 3

Strengths

There is a growing need for increasingly numerous tasks in CRL and Multi-task RL, which this benchmark addresses. The presentation of results and discussions are strong and clear, with useful ablations and performance measures (forgetting, dormant ratio, etc). The paper proposes numerous interesting appendices and adjacent settings such as curriculum learning are a nice addition.

Weaknesses

The claim it is the first continual RL library to leverage JAX should carefully distinguish itself from existing, non-benchmark libraries, such as ReDo (https://github.com/google/dopamine/tree/master/dopamine/labs/redo). Although PyTorch based, Plasticine is also a relevant comparison (https://github.com/RLE-Foundation/Plasticine/tree/main). Whilst there are clearly defined contributions in procedural generation and establishing CL, POMDP and Curriculum settings, the framework seems to also re-i

Reviewer 02Rating 2Confidence 3

Strengths

- The problem setting is interesting and novel. I am not aware of any prior work considering the continual multi-agent reinforcement learning paradigm. - The paper is generally well-written and organized. The experiments use standard continual RL evaluation metrics including forgetting and forward transfer.

Weaknesses

1. MEAL only considers non-stationarity in changing environment layouts (Line1326) while the observation space and action space are the same between "tasks" in MEAL. Given this, I am not certain if the setting studied in this paper is truly continual RL. In particular, this distribution shift from procedural generation is most similar to individual environments in Procgen [1]. For comparison, Jelly Bean World [2] uses an infinite grid world to achieve non-stationarity, Continual World [3] uses d

Reviewer 03Rating 4Confidence 4

Strengths

- Timely and relevant benchmark: Addresses the underexplored intersection of continual learning and cooperative multi-agent reinforcement learning, filling a clear gap in existing benchmarks. - GPU-accelerated simulation: The use of JAX for end-to-end GPU-based training is a notable practical advancement, drastically reducing wall-clock training time and enabling long task sequences (up to 100 tasks on a single GPU). - Comprehensive baseline coverage: Six continual learning baselines implemented

Weaknesses

- Claim of first CMARL benchmark requires clarification: The authors state (line 039) that MEAL is the first continual MARL benchmark, but works such as “Multi-Agent Continual Coordination via Progressive Task Contextualization” (Yuan et al., 2024) already explore continual multi-agent coordination. - Task ID dependence (line 305 & Figure 6): Since the ablation suggests task IDs have negligible effect, it would strengthen the benchmark if baseline results without task ID access were also reporte

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics