MAESTRO: Multi-Agent Environment Shaping through Task and Reward Optimization

Boyuan Wu

arXiv:2511.19253·cs.LG·December 11, 2025

MAESTRO: Multi-Agent Environment Shaping through Task and Reward Optimization

Boyuan Wu

PDF

Open Access

TL;DR

MAESTRO leverages large language models as offline architects to generate curricula and reward functions, significantly improving cooperative multi-agent reinforcement learning performance without increasing inference costs during deployment.

Contribution

It introduces a novel framework that uses LLMs for offline environment shaping, including curriculum generation and reward synthesis, enhancing MARL training efficiency and effectiveness.

Findings

01

Achieves +4.0% higher mean return over baseline.

02

Improves risk-adjusted performance with Sharpe ratio of 1.53.

03

Demonstrates effectiveness in large-scale traffic signal control.

Abstract

Cooperative Multi-Agent Reinforcement Learning (MARL) faces two major design bottlenecks: crafting dense reward functions and constructing curricula that avoid local optima in high-dimensional, non-stationary environments. Existing approaches rely on fixed heuristics or use Large Language Models (LLMs) directly in the control loop, which is costly and unsuitable for real-time systems. We propose MAESTRO (Multi-Agent Environment Shaping through Task and Reward Optimization), a framework that moves the LLM outside the execution loop and uses it as an offline training architect. MAESTRO introduces two generative components: (i) a semantic curriculum generator that creates diverse, performance-driven traffic scenarios, and (ii) an automated reward synthesizer that produces executable Python reward functions adapted to evolving curriculum difficulty. These components guide a standard MARL…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTraffic control and management · Traffic Prediction and Management Techniques · Vehicular Ad Hoc Networks (VANETs)