# cMALC-D: Contextual Multi-Agent LLM-Guided Curriculum Learning with Diversity-Based Context Blending

**Authors:** Anirudh Satheesh, Keenan Powell, Hua Wei

arXiv: 2508.20818 · 2025-08-29

## TL;DR

This paper introduces cMALC-D, a novel curriculum learning framework for multi-agent reinforcement learning that leverages large language models and diversity-based context blending to enhance generalization and efficiency.

## Contribution

The paper proposes a new LLM-guided curriculum learning method with a diversity-based context blending mechanism for multi-agent RL, addressing instability and exploration issues.

## Key findings

- cMALC-D outperforms existing curriculum methods in traffic signal control tasks.
- The diversity-based blending prevents mode collapse and promotes exploration.
- The framework improves both generalization and sample efficiency.

## Abstract

Many multi-agent reinforcement learning (MARL) algorithms are trained in fixed simulation environments, making them brittle when deployed in real-world scenarios with more complex and uncertain conditions. Contextual MARL (cMARL) addresses this by parameterizing environments with context variables and training a context-agnostic policy that performs well across all environment configurations. Existing cMARL methods attempt to use curriculum learning to help train and evaluate context-agnostic policies, but they often rely on unreliable proxy signals, such as value estimates or generalized advantage estimates that are noisy and unstable in multi-agent settings due to inter-agent dynamics and partial observability. To address these issues, we propose Contextual Multi-Agent LLM-Guided Curriculum Learning with Diversity-Based Context Blending (cMALC-D), a framework that uses Large Language Models (LLMs) to generate semantically meaningful curricula and provide a more robust evaluation signal. To prevent mode collapse and encourage exploration, we introduce a novel diversity-based context blending mechanism that creates new training scenarios by combining features from prior contexts. Experiments in traffic signal control domains demonstrate that cMALC-D significantly improves both generalization and sample efficiency compared to existing curriculum learning baselines. We provide code at https://github.com/DaRL-LibSignal/cMALC-D.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.20818/full.md

## Figures

26 figures with captions in the complete paper: https://tomesphere.com/paper/2508.20818/full.md

## References

59 references — full list in the complete paper: https://tomesphere.com/paper/2508.20818/full.md

---
Source: https://tomesphere.com/paper/2508.20818