Overcoming Environmental Meta-Stationarity in MARL via Adaptive Curriculum and Counterfactual Group Advantage

Weiqiang Jin; Yang Liu; Shixiang Tang; Jinhu Qi; Wentao Zhang; Junli Wang; Biao Zhao; and Hongyang Du

arXiv:2506.07548·cs.AI·May 7, 2026

Overcoming Environmental Meta-Stationarity in MARL via Adaptive Curriculum and Counterfactual Group Advantage

Weiqiang Jin, Yang Liu, Shixiang Tang, Jinhu Qi, Wentao Zhang, Junli Wang, Biao Zhao, and Hongyang Du

PDF

1 Repo

TL;DR

This paper introduces CL-MARL, a dynamic curriculum framework for multi-agent reinforcement learning that adapts difficulty online and uses counterfactual advantage estimation to improve policy generalization in non-stationary environments.

Contribution

It proposes a novel adaptive curriculum method with a stable difficulty scheduler and a counterfactual advantage technique to enhance MARL performance under environmental meta-stationarity.

Findings

01

CL-MARL achieves 40% win rate on hard SMAC maps, surpassing baselines.

02

The approach accelerates peak win rate attainment by over 1.2 times.

03

It demonstrates improved policy robustness and generalization in non-stationary multi-agent settings.

Abstract

Multi-agent reinforcement learning (MARL) has reached competitive performance on cooperative tasks against scripted adversaries, yet most methods train agents at a single fixed difficulty throughout the entire run. We term this static-difficulty regime environmental meta-stationarity and show that it caps policy generalization and steers learning toward shallow local optima. To break this regime, we propose CL-MARL, a dynamic curriculum learning framework that adapts opponent strength online from win-rate signals, advancing or regressing the task as agents master it. Its scheduler, FlexDiff, fuses momentum-based trend estimation with sliding-window dual-curve monitoring of training and evaluation returns, yielding stable difficulty transitions without manual tuning. Because a moving curriculum amplifies non-stationarity and sparsifies global rewards, we introduce the Counterfactual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

NICE-HKU/CL2MARL-SMAC
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.