MAESTRO: Multi-Agent Environment Shaping through Task and Reward Optimization
Boyuan Wu

TL;DR
MAESTRO leverages large language models as offline architects to generate curricula and reward functions, significantly improving cooperative multi-agent reinforcement learning performance without increasing inference costs during deployment.
Contribution
It introduces a novel framework that uses LLMs for offline environment shaping, including curriculum generation and reward synthesis, enhancing MARL training efficiency and effectiveness.
Findings
Achieves +4.0% higher mean return over baseline.
Improves risk-adjusted performance with Sharpe ratio of 1.53.
Demonstrates effectiveness in large-scale traffic signal control.
Abstract
Cooperative Multi-Agent Reinforcement Learning (MARL) faces two major design bottlenecks: crafting dense reward functions and constructing curricula that avoid local optima in high-dimensional, non-stationary environments. Existing approaches rely on fixed heuristics or use Large Language Models (LLMs) directly in the control loop, which is costly and unsuitable for real-time systems. We propose MAESTRO (Multi-Agent Environment Shaping through Task and Reward Optimization), a framework that moves the LLM outside the execution loop and uses it as an offline training architect. MAESTRO introduces two generative components: (i) a semantic curriculum generator that creates diverse, performance-driven traffic scenarios, and (ii) an automated reward synthesizer that produces executable Python reward functions adapted to evolving curriculum difficulty. These components guide a standard MARL…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTraffic control and management · Traffic Prediction and Management Techniques · Vehicular Ad Hoc Networks (VANETs)
