Gap-Dependent Bounds for Federated $Q$-learning

Haochen Zhang; Zhong Zheng; Lingzhou Xue

arXiv:2502.02859·stat.ML·September 19, 2025

Gap-Dependent Bounds for Federated $Q$-learning

Haochen Zhang, Zhong Zheng, Lingzhou Xue

PDF

Open Access 1 Video

TL;DR

This paper introduces a gap-dependent analysis for federated Q-learning in finite-horizon MDPs, achieving improved regret and communication bounds by exploiting MDP structures, unlike previous worst-case approaches.

Contribution

It provides the first gap-dependent regret and communication bounds for federated Q-learning, revealing faster convergence and reduced communication costs under benign MDP conditions.

Findings

01

Achieves $ ext{log } T$ regret bounds using gap-dependent analysis.

02

Refines communication cost bounds to remove dependence on $MSA$ in the $ ext{log } T$ term.

03

Shows multi-agent speedup pattern in regret bounds.

Abstract

We present the first gap-dependent analysis of regret and communication cost for on-policy federated $Q$ -Learning in tabular episodic finite-horizon Markov decision processes (MDPs). Existing FRL methods focus on worst-case scenarios, leading to $T$ -type regret bounds and communication cost bounds with a $lo g T$ term scaling with the number of agents $M$ , states $S$ , and actions $A$ , where $T$ is the average total number of steps per agent. In contrast, our novel framework leverages the benign structures of MDPs, such as a strictly positive suboptimality gap, to achieve a $lo g T$ -type regret bound and a refined communication cost bound that disentangles exploration and exploitation. Our gap-dependent regret bound reveals a distinct multi-agent speedup pattern, and our gap-dependent communication cost bound removes the dependence on $M S A$ from the $lo g T$ term. Notably, our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Gap-Dependent Bounds for Federated $Q$-Learning· slideslive

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Machine Learning and Algorithms · Domain Adaptation and Few-Shot Learning

MethodsFocus