Agent-Temporal Credit Assignment for Optimal Policy Preservation in   Sparse Multi-Agent Reinforcement Learning

Aditya Kapoor; Sushant Swamy; Kale-ab Tessera; Mayank Baranwal,; Mingfei Sun; Harshad Khadilkar; Stefano V. Albrecht

arXiv:2412.14779·cs.MA·December 20, 2024

Agent-Temporal Credit Assignment for Optimal Policy Preservation in Sparse Multi-Agent Reinforcement Learning

Aditya Kapoor, Sushant Swamy, Kale-ab Tessera, Mayank Baranwal,, Mingfei Sun, Harshad Khadilkar, Stefano V. Albrecht

PDF

Open Access

TL;DR

This paper introduces TAR$^2$, a reward redistribution method that improves learning stability and speed in multi-agent reinforcement learning with sparse rewards by decomposing global rewards into agent-specific, time-step-specific signals.

Contribution

The paper proposes TAR$^2$, a novel reward redistribution technique that addresses agent-temporal credit assignment while preserving optimal policies, supported by theoretical proof and empirical validation.

Findings

01

TAR$^2$ stabilizes and accelerates learning.

02

When combined with single-agent algorithms, TAR$^2$ matches or outperforms traditional multi-agent methods.

03

TAR$^2$ is equivalent to potential-based reward shaping.

Abstract

In multi-agent environments, agents often struggle to learn optimal policies due to sparse or delayed global rewards, particularly in long-horizon tasks where it is challenging to evaluate actions at intermediate time steps. We introduce Temporal-Agent Reward Redistribution (TAR $^{2}$ ), a novel approach designed to address the agent-temporal credit assignment problem by redistributing sparse rewards both temporally and across agents. TAR $^{2}$ decomposes sparse global rewards into time-step-specific rewards and calculates agent-specific contributions to these rewards. We theoretically prove that TAR $^{2}$ is equivalent to potential-based reward shaping, ensuring that the optimal policy remains unchanged. Empirical results demonstrate that TAR $^{2}$ stabilizes and accelerates the learning process. Additionally, we show that when TAR $^{2}$ is integrated with single-agent reinforcement learning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTraffic control and management