Co2PO: Coordinated Constrained Policy Optimization for Multi-Agent RL
Shrenik Patel, Christine Truong

TL;DR
Co2PO introduces a communication-based framework for multi-agent reinforcement learning that proactively manages safety constraints by forecasting hazards, enabling better exploration and higher returns without over-conservatism.
Contribution
It presents a novel communication-augmented approach with hazard prediction and risk-aware optimization for constrained multi-agent RL, improving safety and performance.
Findings
Achieves higher returns than baseline methods on safety benchmarks.
Converges to cost-compliant policies at deployment.
Validates importance of risk-triggered communication and adaptive gating.
Abstract
Constrained multi-agent reinforcement learning (MARL) faces a fundamental tension between exploration and safety-constrained optimization. Existing leading approaches, such as Lagrangian methods, typically rely on global penalties or centralized critics that react to violations after they occur, often suppressing exploration and leading to over-conservatism. We propose Co2PO, a novel MARL communication-augmented framework that enables coordination-driven safety through selective, risk-aware communication. Co2PO introduces a shared blackboard architecture for broadcasting positional intent and yield signals, governed by a learned hazard predictor that proactively forecasts potential violations over an extended temporal horizon. By integrating these forecasts into a constrained optimization objective, Co2PO allows agents to anticipate and navigate collective hazards without the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Multimodal Machine Learning Applications
