TL;DR
This paper introduces a Byzantine-resilient decentralized TD learning algorithm with linear function approximation, capable of accurately evaluating policies in multi-agent reinforcement learning environments despite malicious agents.
Contribution
It proposes a novel trimmed-mean based algorithm that ensures robustness and convergence in the presence of Byzantine agents in decentralized RL.
Findings
The algorithm converges at a finite-time rate.
It maintains accurate policy evaluation despite Byzantine adversaries.
Numerical experiments confirm robustness and effectiveness.
Abstract
This paper considers the policy evaluation problem in a multi-agent reinforcement learning (MARL) environment over decentralized and directed networks. The focus is on decentralized temporal difference (TD) learning with linear function approximation in the presence of unreliable or even malicious agents, termed as Byzantine agents. In order to evaluate the quality of a fixed policy in a common environment, agents usually run decentralized TD() collaboratively. However, when some Byzantine agents behave adversarially, decentralized TD() is unable to learn an accurate linear approximation for the true value function. We propose a trimmed-mean based Byzantine-resilient decentralized TD() algorithm to perform policy evaluation in this setting. We establish the finite-time convergence rate, as well as the asymptotic learning error in the presence of Byzantine…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
