TUC-PPO: Team Utility-Constrained Proximal Policy Optimization for Spatial Public Goods Games
Zhaoqilin Yang, Xin Wang, Ruichen Zhang, Chanchan Li, Youliang Tian

TL;DR
TUC-PPO is a novel reinforcement learning framework that explicitly incorporates team welfare constraints into policy optimization, leading to faster convergence and more stable cooperation in spatial public goods games.
Contribution
It introduces a bi-level constrained optimization approach within PPO, integrating team utility objectives for improved multi-agent cooperation.
Findings
Outperforms standard PPO and evolutionary baselines.
Achieves faster convergence to cooperative states.
Enhances stability against defector invasion.
Abstract
We introduce Team Utility-Constrained Proximal Policy Optimization (TUC-PPO), a new deep reinforcement learning framework. It extends Proximal Policy Optimization (PPO) by integrating team welfare objectives specifically for spatial public goods games. Unlike conventional approaches where cooperation emerges indirectly from individual rewards, TUC-PPO instead optimizes a bi-level objective integrating policy gradients and team utility constraints. Consequently, all policy updates explicitly incorporate collective payoff thresholds. The framework preserves PPO's policy gradient core while incorporating constrained optimization through adaptive Lagrangian multipliers. Therefore, decentralized agents dynamically balance selfish and cooperative incentives. The comparative analysis demonstrates superior performance of this constrained deep reinforcement learning approach compared to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
