TL;DR
This paper introduces AgarCL, a novel continual reinforcement learning platform based on the game Agar.io, to evaluate agents in a complex, evolving environment with stochastic dynamics and partial observability.
Contribution
The paper presents AgarCL, a new high-dimensional, non-episodic environment for continual RL research, along with benchmark results and analysis of existing continual learning methods.
Findings
Standard RL algorithms perform comparably to specialized continual learning methods in AgarCL.
AgarCL's environment complexity challenges current continual RL approaches.
Benchmark results for DQN, PPO, and SAC on AgarCL and its sub-tasks.
Abstract
Continual reinforcement learning (RL) concerns agents that are expected to learn continually, rather than converge to a policy that is then fixed for evaluation. This setting is well-suited to environments that the agent perceives as changing over time, rendering any static policy ineffective. In continual RL, researchers often simulate such changes either by modifying episodic environments to incorporate task shifts during interaction or by designing simulators that explicitly model continual dynamics. However, transforming episodic problems into continual ones primarily captures scenarios involving abrupt changes in the data stream and still relies on episodic structure. Meanwhile, the few simulators explicitly designed for empirical continual RL research are often limited in scope or complexity. In this paper, we introduce AgarCL, a research platform for continual RL that enables…
Peer Reviews
Decision·Submitted to ICLR 2026
1. The manuscript targets a gap in CRL benchmarks by providing a non-episodic environment with smooth endogenous non-stationarity, moving beyond artificial task switches common in prior work. 2. The environment’s combination of partial observability, continuous control, high-dimensional observations, and potentially infinite horizon is well aligned with realistic continual learning challenges. 3. The manuscript provides a detailed description of the platform and its dynamics, which can help rese
1. The originality of the platform is limited, as it largely adapts Agar.io for RL without clear evidence of novel environment design beyond configuration. 2. The evaluation does not include algorithms specifically designed for CRL (or methods targeting non-stationarity, online adaptation, memory consolidation, or meta-learning), making it hard to assess whether the environment differentiates among approaches intended for this setting. 3. The baselines (DQN, PPO, SAC) are standard and not state-
The paper presents a clear and well-structured formalization of the problem, with solid motivation and coherent methodology. The presentation is generally good, and the theoretical framing is appealing
The contribution lacks strong novelty, as it mostly adapts an existing game setup rather than introducing new concepts. The analysis of opponent policies could be expanded, for example by addressing non-stationary behaviors. While the focus is not on continual learning (CL), it would be valuable to highlight the method’s compatibility with existing CL frameworks. Finally, a few figures and tables would benefit from clearer legends for better readability.
1. The rewards and hybrid action space (mimicking cursor control with discrete actions) are well designed and align with the gameplay dynamics. 2. The accompanying video provides a clear, intuitive overview of the environment, helping readers unfamiliar with [Agar.io](http://agar.io/) quickly grasp the core mechanics and objectives. 3. The paper is clearly written, visually well-organized, and supported by detailed figures and appendices that make the environment’s design, components, and experi
1. **Continual?** The environment is not truly *continual* in the RL sense. In continual RL, the world itself changes while the agent’s policy persists: new objectives emerge, opponent distributions evolve, and goals shift. In AgarCL, the apparent “change” stems entirely from the agent’s own state: as mass increases, movement slows and the field of view expands, altering the interaction dynamics. These effects are endogenous and fully captured by a single stationary MDP, while the transition and
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAverage Pooling · Q-Learning · Convolution · Dilated Convolution · Global Average Pooling · 1x1 Convolution · Dense Connections · Deep Q-Network · Entropy Regularization · Switchable Atrous Convolution
