The challenge of hidden gifts in multi-agent reinforcement learning
Dane Malenfant, Blake A. Richards

TL;DR
This paper investigates the challenge of credit assignment in multi-agent reinforcement learning when beneficial actions by others are hidden, introducing a simple grid-world task and proposing a correction to improve learning.
Contribution
It introduces a novel MARL task illustrating hidden gift challenges and proposes a variance reduction correction for policy gradient agents to enhance collective learning.
Findings
State-of-the-art MARL algorithms fail on the hidden gift task.
Decentralized actor-critic agents succeed with action history information.
A correction term reduces variance and improves convergence to collective success.
Abstract
Sometimes we benefit from actions that others have taken even when we are unaware that they took those actions. For example, if your neighbor chooses not to take a parking spot in front of your house when you are not there, you can benefit, even without being aware that they took this action. These ``hidden gifts'' represent an interesting challenge for multi-agent reinforcement learning (MARL), since assigning credit when the beneficial actions of others are hidden is non-trivial. Here, we study the impact of hidden gifts with a simple MARL task. In this task, agents in a grid-world environment have individual doors to unlock in order to obtain individual rewards. As well, if all the agents unlock their door the group receives a larger collective reward. However, there is only one key for all of the doors, such that the collective reward can only be obtained when the agents drop the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
