Scaling Multi-Agent Environment Co-Design with Diffusion Models

Hao Xiang Li; Michael Amir; Amanda Prorok

arXiv:2511.03100·cs.LG·November 6, 2025

Scaling Multi-Agent Environment Co-Design with Diffusion Models

Hao Xiang Li, Michael Amir, Amanda Prorok

PDF

Open Access 3 Reviews

TL;DR

This paper introduces DiCoDe, a scalable and sample-efficient framework for multi-agent environment co-design using diffusion models, significantly improving performance in complex benchmarks.

Contribution

The paper presents DiCoDe, a novel diffusion-based co-design method with Projected Universal Guidance and critic distillation, enabling scalable and efficient joint optimization.

Findings

01

Achieves 39% higher rewards in warehouse automation.

02

Uses 66% fewer simulation samples than previous methods.

03

Outperforms state-of-the-art in multiple multi-agent co-design benchmarks.

Abstract

The agent-environment co-design paradigm jointly optimises agent policies and environment configurations in search of improved system performance. With application domains ranging from warehouse logistics to windfarm management, co-design promises to fundamentally change how we deploy multi-agent systems. However, current co-design methods struggle to scale. They collapse under high-dimensional environment design spaces and suffer from sample inefficiency when addressing moving targets inherent to joint optimisation. We address these challenges by developing Diffusion Co-Design (DiCoDe), a scalable and sample-efficient co-design framework pushing co-design towards practically relevant settings. DiCoDe incorporates two core innovations. First, we introduce Projected Universal Guidance (PUG), a sampling technique that enables DiCoDe to explore a distribution of reward-maximising…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 8Confidence 4

Strengths

I think introducing diffusion models for optimizing the environment and policy for a multi-agent reinforcement learning problem is a good strength. The authors also introduce 2 key insights, namely the projection operator in generating the environment, and also using the agent critic estimates while optimizing for the environment critic function.

Weaknesses

I do not see a major weakness for this paper.

Reviewer 02Rating 6Confidence 3

Strengths

1. The paper introduces Diffusion Co-Design (DiCoDe), a new framework for _jointly_ optimising multi-agent policies and environment parameters — a direction previously limited by scalability and sample inefficiency. 2. The paper proposes Projected Universal Guidance (PUG) that is a principled sampling technique that merges universal guidance and projected diffusion models, enabling generation of high-reward, constraint-satisfying environments. This addresses the infeasibility of standard classif

Weaknesses

1. The paper lacks a formal theoretical analysis of convergence or stability for the co-design process. While the diffusion-based sampling and critic distillation are well-motivated empirically, there is no formal justification (e.g., fixed-point or equilibrium guarantees) that the joint optimisation of agents and environment converges to any optimal co-design solution. The method’s stability arguments are heuristic — mainly relying on decoupling the generator (fixed diffusion model) from the a

Reviewer 03Rating 2Confidence 5

Strengths

- **Originality**: DiCoDe is the **first method to apply guided diffusion models** to **multi-agent environment co-design**, and it introduces **PUG**, a novel constraint-aware sampling technique that combines **universal guidance** with **projected constraints**. - **Quality**: The paper provides **strong empirical results** across **three diverse domains**, showing **consistent improvements** over baselines. The **critic distillation** mechanism is well-motivated and addresses **polic

Weaknesses

### W1. **Scalability vs. Algorithm Design is Unclear** - While the paper claims **scalability**, the **computational cost** of **guided diffusion** **increases** with the **number of agents** and **environment dimensionality**. - **No complexity analysis** is provided for **PUG sampling** or **critic distillation** w.r.t. **agent count**. - **Missing experiment**: **Scaling curves** with **increasing agents** (e.g., 4→16→32) to **quantify** how **wall-clock time** or **memory** grows. ### W2.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Multi-Objective Optimization Algorithms · Adaptive Dynamic Programming Control