The Cell Must Go On: Agar.io for Continual Reinforcement Learning

Mohamed A. Mohamed; Kateryna Nekhomiazh; Vedant Vyas; Marcos M. Jose; Andrew Patterson; Marlos C. Machado

arXiv:2505.18347·cs.LG·March 10, 2026

The Cell Must Go On: Agar.io for Continual Reinforcement Learning

Mohamed A. Mohamed, Kateryna Nekhomiazh, Vedant Vyas, Marcos M. Jose, Andrew Patterson, Marlos C. Machado

PDF

1 Repo 3 Reviews

TL;DR

This paper introduces AgarCL, a novel continual reinforcement learning platform based on the game Agar.io, to evaluate agents in a complex, evolving environment with stochastic dynamics and partial observability.

Contribution

The paper presents AgarCL, a new high-dimensional, non-episodic environment for continual RL research, along with benchmark results and analysis of existing continual learning methods.

Findings

01

Standard RL algorithms perform comparably to specialized continual learning methods in AgarCL.

02

AgarCL's environment complexity challenges current continual RL approaches.

03

Benchmark results for DQN, PPO, and SAC on AgarCL and its sub-tasks.

Abstract

Continual reinforcement learning (RL) concerns agents that are expected to learn continually, rather than converge to a policy that is then fixed for evaluation. This setting is well-suited to environments that the agent perceives as changing over time, rendering any static policy ineffective. In continual RL, researchers often simulate such changes either by modifying episodic environments to incorporate task shifts during interaction or by designing simulators that explicitly model continual dynamics. However, transforming episodic problems into continual ones primarily captures scenarios involving abrupt changes in the data stream and still relies on episodic structure. Meanwhile, the few simulators explicitly designed for empirical continual RL research are often limited in scope or complexity. In this paper, we introduce AgarCL, a research platform for continual RL that enables…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 4Confidence 4

Strengths

1. The manuscript targets a gap in CRL benchmarks by providing a non-episodic environment with smooth endogenous non-stationarity, moving beyond artificial task switches common in prior work. 2. The environment’s combination of partial observability, continuous control, high-dimensional observations, and potentially infinite horizon is well aligned with realistic continual learning challenges. 3. The manuscript provides a detailed description of the platform and its dynamics, which can help rese

Weaknesses

1. The originality of the platform is limited, as it largely adapts Agar.io for RL without clear evidence of novel environment design beyond configuration. 2. The evaluation does not include algorithms specifically designed for CRL (or methods targeting non-stationarity, online adaptation, memory consolidation, or meta-learning), making it hard to assess whether the environment differentiates among approaches intended for this setting. 3. The baselines (DQN, PPO, SAC) are standard and not state-

Reviewer 02Rating 4Confidence 4

Strengths

The paper presents a clear and well-structured formalization of the problem, with solid motivation and coherent methodology. The presentation is generally good, and the theoretical framing is appealing

Weaknesses

The contribution lacks strong novelty, as it mostly adapts an existing game setup rather than introducing new concepts. The analysis of opponent policies could be expanded, for example by addressing non-stationary behaviors. While the focus is not on continual learning (CL), it would be valuable to highlight the method’s compatibility with existing CL frameworks. Finally, a few figures and tables would benefit from clearer legends for better readability.

Reviewer 03Rating 2Confidence 4

Strengths

1. The rewards and hybrid action space (mimicking cursor control with discrete actions) are well designed and align with the gameplay dynamics. 2. The accompanying video provides a clear, intuitive overview of the environment, helping readers unfamiliar with [Agar.io](http://agar.io/) quickly grasp the core mechanics and objectives. 3. The paper is clearly written, visually well-organized, and supported by detailed figures and appendices that make the environment’s design, components, and experi

Weaknesses

1. **Continual?** The environment is not truly *continual* in the RL sense. In continual RL, the world itself changes while the agent’s policy persists: new objectives emerge, opponent distributions evolve, and goals shift. In AgarCL, the apparent “change” stems entirely from the agent’s own state: as mass increases, movement slows and the field of view expands, altering the interaction dynamics. These effects are endogenous and fully captured by a single stationary MDP, while the transition and

Code & Models

Repositories

machado-research/AgarCL
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAverage Pooling · Q-Learning · Convolution · Dilated Convolution · Global Average Pooling · 1x1 Convolution · Dense Connections · Deep Q-Network · Entropy Regularization · Switchable Atrous Convolution