Scaling Multi-Agent Environment Co-Design with Diffusion Models
Hao Xiang Li, Michael Amir, Amanda Prorok

TL;DR
This paper introduces DiCoDe, a scalable and sample-efficient framework for multi-agent environment co-design using diffusion models, significantly improving performance in complex benchmarks.
Contribution
The paper presents DiCoDe, a novel diffusion-based co-design method with Projected Universal Guidance and critic distillation, enabling scalable and efficient joint optimization.
Findings
Achieves 39% higher rewards in warehouse automation.
Uses 66% fewer simulation samples than previous methods.
Outperforms state-of-the-art in multiple multi-agent co-design benchmarks.
Abstract
The agent-environment co-design paradigm jointly optimises agent policies and environment configurations in search of improved system performance. With application domains ranging from warehouse logistics to windfarm management, co-design promises to fundamentally change how we deploy multi-agent systems. However, current co-design methods struggle to scale. They collapse under high-dimensional environment design spaces and suffer from sample inefficiency when addressing moving targets inherent to joint optimisation. We address these challenges by developing Diffusion Co-Design (DiCoDe), a scalable and sample-efficient co-design framework pushing co-design towards practically relevant settings. DiCoDe incorporates two core innovations. First, we introduce Projected Universal Guidance (PUG), a sampling technique that enables DiCoDe to explore a distribution of reward-maximising…
Peer Reviews
Decision·Submitted to ICLR 2026
I think introducing diffusion models for optimizing the environment and policy for a multi-agent reinforcement learning problem is a good strength. The authors also introduce 2 key insights, namely the projection operator in generating the environment, and also using the agent critic estimates while optimizing for the environment critic function.
I do not see a major weakness for this paper.
1. The paper introduces Diffusion Co-Design (DiCoDe), a new framework for _jointly_ optimising multi-agent policies and environment parameters — a direction previously limited by scalability and sample inefficiency. 2. The paper proposes Projected Universal Guidance (PUG) that is a principled sampling technique that merges universal guidance and projected diffusion models, enabling generation of high-reward, constraint-satisfying environments. This addresses the infeasibility of standard classif
1. The paper lacks a formal theoretical analysis of convergence or stability for the co-design process. While the diffusion-based sampling and critic distillation are well-motivated empirically, there is no formal justification (e.g., fixed-point or equilibrium guarantees) that the joint optimisation of agents and environment converges to any optimal co-design solution. The method’s stability arguments are heuristic — mainly relying on decoupling the generator (fixed diffusion model) from the a
- **Originality**: DiCoDe is the **first method to apply guided diffusion models** to **multi-agent environment co-design**, and it introduces **PUG**, a novel constraint-aware sampling technique that combines **universal guidance** with **projected constraints**. - **Quality**: The paper provides **strong empirical results** across **three diverse domains**, showing **consistent improvements** over baselines. The **critic distillation** mechanism is well-motivated and addresses **polic
### W1. **Scalability vs. Algorithm Design is Unclear** - While the paper claims **scalability**, the **computational cost** of **guided diffusion** **increases** with the **number of agents** and **environment dimensionality**. - **No complexity analysis** is provided for **PUG sampling** or **critic distillation** w.r.t. **agent count**. - **Missing experiment**: **Scaling curves** with **increasing agents** (e.g., 4→16→32) to **quantify** how **wall-clock time** or **memory** grows. ### W2.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Multi-Objective Optimization Algorithms · Adaptive Dynamic Programming Control
