Co2PO: Coordinated Constrained Policy Optimization for Multi-Agent RL

Shrenik Patel; Christine Truong

arXiv:2602.02970·cs.LG·February 4, 2026

Co2PO: Coordinated Constrained Policy Optimization for Multi-Agent RL

Shrenik Patel, Christine Truong

PDF

Open Access

TL;DR

Co2PO introduces a communication-based framework for multi-agent reinforcement learning that proactively manages safety constraints by forecasting hazards, enabling better exploration and higher returns without over-conservatism.

Contribution

It presents a novel communication-augmented approach with hazard prediction and risk-aware optimization for constrained multi-agent RL, improving safety and performance.

Findings

01

Achieves higher returns than baseline methods on safety benchmarks.

02

Converges to cost-compliant policies at deployment.

03

Validates importance of risk-triggered communication and adaptive gating.

Abstract

Constrained multi-agent reinforcement learning (MARL) faces a fundamental tension between exploration and safety-constrained optimization. Existing leading approaches, such as Lagrangian methods, typically rely on global penalties or centralized critics that react to violations after they occur, often suppressing exploration and leading to over-conservatism. We propose Co2PO, a novel MARL communication-augmented framework that enables coordination-driven safety through selective, risk-aware communication. Co2PO introduces a shared blackboard architecture for broadcasting positional intent and yield signals, governed by a learned hazard predictor that proactively forecasts potential violations over an extended temporal horizon. By integrating these forecasts into a constrained optimization objective, Co2PO allows agents to anticipate and navigate collective hazards without the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Multimodal Machine Learning Applications