Communication Efficient Parallel Reinforcement Learning

Mridul Agarwal; Bhargav Ganguly; Vaneet Aggarwal

arXiv:2102.10740·cs.LG·February 23, 2021

Communication Efficient Parallel Reinforcement Learning

Mridul Agarwal, Bhargav Ganguly, Vaneet Aggarwal

PDF

Open Access

TL;DR

This paper introduces a communication-efficient parallel reinforcement learning algorithm that reduces communication rounds while maintaining near-optimal regret bounds, demonstrated through empirical evaluation.

Contribution

The paper proposes extbackslash NAM, an algorithm enabling multiple agents to minimize regret with infrequent communication, achieving theoretical regret bounds and reduced communication overhead.

Findings

01

Achieves regret bound of O(DS\u221A(MAT))

02

Reduces communication rounds to O(MSA\u2217 log(MT))

03

Performs comparably to continuous communication algorithms in experiments

Abstract

We consider the problem where $M$ agents interact with $M$ identical and independent environments with $S$ states and $A$ actions using reinforcement learning for $T$ rounds. The agents share their data with a central server to minimize their regret. We aim to find an algorithm that allows the agents to minimize the regret with infrequent communication rounds. We provide \NAM\ which runs at each agent and prove that the total cumulative regret of $M$ agents is upper bounded as $\Tilde O (D S M A T)$ for a Markov Decision Process with diameter $D$ , number of states $S$ , and number of actions $A$ . The agents synchronize after their visitations to any state-action pair exceeds a certain threshold. Using this, we obtain a bound of $O (M S A lo g (M T))$ on the total number of communications rounds. Finally, we evaluate the algorithm against multiple environments and demonstrate that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Age of Information Optimization