Settling the Communication Complexity for Distributed Offline Reinforcement Learning
Juliusz Krysztof Ziomek, Jun Wang, Yaodong Yang

TL;DR
This paper establishes fundamental limits on communication efficiency in distributed offline reinforcement learning, providing lower bounds and algorithms that achieve near-optimal risk under strict communication constraints.
Contribution
It introduces the first minimax lower bounds for distributed offline RL and proposes algorithms that nearly attain these bounds under single-round communication.
Findings
Lower bounds on communication bits scale as Ω(AC) for contextual bandits.
Proposed algorithms based on least-squares and Monte-Carlo estimates achieve near-optimal risk.
Temporal difference methods are less effective under the communication constraints.
Abstract
We study a novel setting in offline reinforcement learning (RL) where a number of distributed machines jointly cooperate to solve the problem but only one single round of communication is allowed and there is a budget constraint on the total number of information (in terms of bits) that each machine can send out. For value function prediction in contextual bandits, and both episodic and non-episodic MDPs, we establish information-theoretic lower bounds on the minimax risk for distributed statistical estimators; this reveals the minimum amount of communication required by any offline RL algorithms. Specifically, for contextual bandits, we show that the number of bits must scale at least as to match the centralised minimax optimal rate, where is the number of actions and is the context dimension; meanwhile, we reach similar results in the MDP settings. Furthermore, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Mobile Crowdsensing and Crowdsourcing · Evolutionary Algorithms and Applications
