Cooperative Actor-Critic via TD Error Aggregation
Martin Figura, Yixuan Lin, Ji Liu, Vijay Gupta

TL;DR
This paper introduces a privacy-preserving decentralized actor-critic algorithm for multi-agent reinforcement learning that handles communication delays and dropouts, with a quadratic communication cost relative to network size.
Contribution
It proposes a novel TD error aggregation method that maintains privacy, handles unreliable communication, and scales efficiently in large agent networks.
Findings
Algorithm converges to team-optimal policies.
Communication burden is quadratic in network size.
Effective in large, unreliable networks.
Abstract
In decentralized cooperative multi-agent reinforcement learning, agents can aggregate information from one another to learn policies that maximize a team-average objective function. Despite the willingness to cooperate with others, the individual agents may find direct sharing of information about their local state, reward, and value function undesirable due to privacy issues. In this work, we introduce a decentralized actor-critic algorithm with TD error aggregation that does not violate privacy issues and assumes that communication channels are subject to time delays and packet dropouts. The cost we pay for making such weak assumptions is an increased communication burden for every agent as measured by the dimension of the transmitted data. Interestingly, the communication burden is only quadratic in the graph size, which renders the algorithm applicable in large networks. We provide…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGame Theory and Applications · Distributed Control Multi-Agent Systems · Reinforcement Learning in Robotics
