Multi-Agent Reinforcement Learning with Reward Delays
Yuyang Zhang, Runyu Zhang, Yuantao Gu, Na Li

TL;DR
This paper introduces MARL algorithms capable of efficiently handling reward delays, both finite and infinite, with proven convergence rates and equilibrium guarantees, advancing the robustness of multi-agent learning in delayed reward settings.
Contribution
The paper develops novel MARL algorithms based on V-learning that effectively manage reward delays, providing theoretical convergence rates and extending to infinite delays with a reward skipping scheme.
Findings
Achieves convergence to coarse correlated equilibrium with explicit rate.
Extends to infinite delays with a reward skipping scheme.
Provides theoretical analysis of delay impact on learning performance.
Abstract
This paper considers multi-agent reinforcement learning (MARL) where the rewards are received after delays and the delay time varies across agents and across time steps. Based on the V-learning framework, this paper proposes MARL algorithms that efficiently deal with reward delays. When the delays are finite, our algorithm reaches a coarse correlated equilibrium (CCE) with rate where is the number of episodes, is the planning horizon, is the size of the state space, is the size of the largest action space, and is the measure of total delay formally defined in the paper. Moreover, our algorithm is extended to cases with infinite delays through a reward skipping scheme. It achieves convergence rate similar to the finite delay case.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGame Theory and Applications · Reinforcement Learning in Robotics
