MARL with General Utilities via Decentralized Shadow Reward Actor-Critic

Junyu Zhang; Amrit Singh Bedi; Mengdi Wang; and Alec Koppel

arXiv:2106.00543·stat.ML·June 25, 2021·5 cites

MARL with General Utilities via Decentralized Shadow Reward Actor-Critic

Junyu Zhang, Amrit Singh Bedi, Mengdi Wang, and Alec Koppel

PDF

Open Access

TL;DR

This paper introduces DSAC, a decentralized actor-critic algorithm for multi-agent reinforcement learning that optimizes general utilities, including risk and exploration, and guarantees convergence to global optima.

Contribution

The paper proposes a novel decentralized algorithm for MARL that handles general utilities and proves convergence to globally optimal policies.

Findings

01

DSAC converges to an $ ext{epsilon}$-stationary point in $ ilde{O}(1/ ext{epsilon}^{2.5})$ steps.

02

Faster convergence of $ ilde{O}(1/ ext{epsilon}^{2})$ with increased communication.

03

Experiments show benefits of using general utilities beyond cumulative return.

Abstract

We posit a new mechanism for cooperation in multi-agent reinforcement learning (MARL) based upon any nonlinear function of the team's long-term state-action occupancy measure, i.e., a \emph{general utility}. This subsumes the cumulative return but also allows one to incorporate risk-sensitivity, exploration, and priors. % We derive the {\bf D}ecentralized {\bf S}hadow Reward {\bf A}ctor-{\bf C}ritic (DSAC) in which agents alternate between policy evaluation (critic), weighted averaging with neighbors (information mixing), and local gradient updates for their policy parameters (actor). DSAC augments the classic critic step by requiring agents to (i) estimate their local occupancy measure in order to (ii) estimate the derivative of the local utility with respect to their occupancy measure, i.e., the "shadow reward". DSAC converges to $ϵ$ -stationarity in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMulti-Agent Systems and Negotiation · Reinforcement Learning in Robotics · Auction Theory and Applications