TL;DR
This paper introduces CL-MARL, a dynamic curriculum framework for multi-agent reinforcement learning that adapts difficulty online and uses counterfactual advantage estimation to improve policy generalization in non-stationary environments.
Contribution
It proposes a novel adaptive curriculum method with a stable difficulty scheduler and a counterfactual advantage technique to enhance MARL performance under environmental meta-stationarity.
Findings
CL-MARL achieves 40% win rate on hard SMAC maps, surpassing baselines.
The approach accelerates peak win rate attainment by over 1.2 times.
It demonstrates improved policy robustness and generalization in non-stationary multi-agent settings.
Abstract
Multi-agent reinforcement learning (MARL) has reached competitive performance on cooperative tasks against scripted adversaries, yet most methods train agents at a single fixed difficulty throughout the entire run. We term this static-difficulty regime environmental meta-stationarity and show that it caps policy generalization and steers learning toward shallow local optima. To break this regime, we propose CL-MARL, a dynamic curriculum learning framework that adapts opponent strength online from win-rate signals, advancing or regressing the task as agents master it. Its scheduler, FlexDiff, fuses momentum-based trend estimation with sliding-window dual-curve monitoring of training and evaluation returns, yielding stable difficulty transitions without manual tuning. Because a moving curriculum amplifies non-stationarity and sparsifies global rewards, we introduce the Counterfactual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
