MARL with General Utilities via Decentralized Shadow Reward Actor-Critic
Junyu Zhang, Amrit Singh Bedi, Mengdi Wang, and Alec Koppel

TL;DR
This paper introduces DSAC, a decentralized actor-critic algorithm for multi-agent reinforcement learning that optimizes general utilities, including risk and exploration, and guarantees convergence to global optima.
Contribution
The paper proposes a novel decentralized algorithm for MARL that handles general utilities and proves convergence to globally optimal policies.
Findings
DSAC converges to an $ ext{epsilon}$-stationary point in $ ilde{O}(1/ ext{epsilon}^{2.5})$ steps.
Faster convergence of $ ilde{O}(1/ ext{epsilon}^{2})$ with increased communication.
Experiments show benefits of using general utilities beyond cumulative return.
Abstract
We posit a new mechanism for cooperation in multi-agent reinforcement learning (MARL) based upon any nonlinear function of the team's long-term state-action occupancy measure, i.e., a \emph{general utility}. This subsumes the cumulative return but also allows one to incorporate risk-sensitivity, exploration, and priors. % We derive the {\bf D}ecentralized {\bf S}hadow Reward {\bf A}ctor-{\bf C}ritic (DSAC) in which agents alternate between policy evaluation (critic), weighted averaging with neighbors (information mixing), and local gradient updates for their policy parameters (actor). DSAC augments the classic critic step by requiring agents to (i) estimate their local occupancy measure in order to (ii) estimate the derivative of the local utility with respect to their occupancy measure, i.e., the "shadow reward". DSAC converges to -stationarity in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMulti-Agent Systems and Negotiation · Reinforcement Learning in Robotics · Auction Theory and Applications
