Gap-Dependent Bounds for Federated $Q$-learning
Haochen Zhang, Zhong Zheng, Lingzhou Xue

TL;DR
This paper introduces a gap-dependent analysis for federated Q-learning in finite-horizon MDPs, achieving improved regret and communication bounds by exploiting MDP structures, unlike previous worst-case approaches.
Contribution
It provides the first gap-dependent regret and communication bounds for federated Q-learning, revealing faster convergence and reduced communication costs under benign MDP conditions.
Findings
Achieves $ ext{log } T$ regret bounds using gap-dependent analysis.
Refines communication cost bounds to remove dependence on $MSA$ in the $ ext{log } T$ term.
Shows multi-agent speedup pattern in regret bounds.
Abstract
We present the first gap-dependent analysis of regret and communication cost for on-policy federated -Learning in tabular episodic finite-horizon Markov decision processes (MDPs). Existing FRL methods focus on worst-case scenarios, leading to -type regret bounds and communication cost bounds with a term scaling with the number of agents , states , and actions , where is the average total number of steps per agent. In contrast, our novel framework leverages the benign structures of MDPs, such as a strictly positive suboptimality gap, to achieve a -type regret bound and a refined communication cost bound that disentangles exploration and exploitation. Our gap-dependent regret bound reveals a distinct multi-agent speedup pattern, and our gap-dependent communication cost bound removes the dependence on from the term. Notably, our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Machine Learning and Algorithms · Domain Adaptation and Few-Shot Learning
MethodsFocus
