Communication Efficient Parallel Reinforcement Learning
Mridul Agarwal, Bhargav Ganguly, Vaneet Aggarwal

TL;DR
This paper introduces a communication-efficient parallel reinforcement learning algorithm that reduces communication rounds while maintaining near-optimal regret bounds, demonstrated through empirical evaluation.
Contribution
The paper proposes extbackslash NAM, an algorithm enabling multiple agents to minimize regret with infrequent communication, achieving theoretical regret bounds and reduced communication overhead.
Findings
Achieves regret bound of O(DS\u221A(MAT))
Reduces communication rounds to O(MSA\u2217 log(MT))
Performs comparably to continuous communication algorithms in experiments
Abstract
We consider the problem where agents interact with identical and independent environments with states and actions using reinforcement learning for rounds. The agents share their data with a central server to minimize their regret. We aim to find an algorithm that allows the agents to minimize the regret with infrequent communication rounds. We provide \NAM\ which runs at each agent and prove that the total cumulative regret of agents is upper bounded as for a Markov Decision Process with diameter , number of states , and number of actions . The agents synchronize after their visitations to any state-action pair exceeds a certain threshold. Using this, we obtain a bound of on the total number of communications rounds. Finally, we evaluate the algorithm against multiple environments and demonstrate that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Age of Information Optimization
