Difference Rewards Policy Gradients

Jacopo Castellini; Sam Devlin; Frans A. Oliehoek; Rahul Savani

arXiv:2012.11258·cs.MA·December 20, 2024

Difference Rewards Policy Gradients

Jacopo Castellini, Sam Devlin, Frans A. Oliehoek, Rahul Savani

PDF

Open Access

TL;DR

This paper introduces Dr.Reinforce, a novel policy gradient algorithm that improves multi-agent reinforcement learning by explicitly addressing credit assignment through difference rewards, enhancing decentralized policy learning.

Contribution

The paper presents a new algorithm combining difference rewards with policy gradients, avoiding Q-function learning, and extends it to unknown reward scenarios with a learned reward network.

Findings

01

Dr.Reinforce outperforms existing methods in multi-agent credit assignment tasks.

02

The approach effectively learns decentralized policies with known and unknown rewards.

03

It simplifies credit assignment by directly differencing the reward function.

Abstract

Policy gradient methods have become one of the most popular classes of algorithms for multi-agent reinforcement learning. A key challenge, however, that is not addressed by many of these methods is multi-agent credit assignment: assessing an agent's contribution to the overall performance, which is crucial for learning good policies. We propose a novel algorithm called Dr.Reinforce that explicitly tackles this by combining difference rewards with policy gradients to allow for learning decentralized policies when the reward function is known. By differencing the reward function directly, Dr.Reinforce avoids difficulties associated with learning the Q-function as done by Counterfactual Multiagent Policy Gradients (COMA), a state-of-the-art difference rewards method. For applications where the reward function is unknown, we show the effectiveness of a version of Dr.Reinforce that learns an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics