Federated UCBVI: Communication-Efficient Federated Regret Minimization with Heterogeneous Agents
Safwan Labbi, Daniil Tiapkin, Lorenzo Mancini, Paul Mangold, Eric, Moulines

TL;DR
This paper introduces Fed-UCBVI, a communication-efficient federated reinforcement learning algorithm that achieves near-optimal regret bounds and scales linearly with the number of agents, even with heterogeneous environments.
Contribution
We propose Fed-UCBVI, a novel federated UCB-based algorithm with theoretical regret guarantees and minimal communication overhead in heterogeneous multi-agent settings.
Findings
Regret scales as ext{O}(\u007B ext{sqrt}(H^3 | ext{S}| | ext{A}| T / M))
Achieves minimax lower bounds in single-agent case
Linearly scales with the number of agents in multi-agent setting
Abstract
In this paper, we present the Federated Upper Confidence Bound Value Iteration algorithm (), a novel extension of the algorithm (Azar et al., 2017) tailored for the federated learning framework. We prove that the regret of scales as , with a small additional term due to heterogeneity, where is the number of states, is the number of actions, is the episode length, is the number of agents, and is the number of episodes. Notably, in the single-agent setting, this upper bound matches the minimax lower bound up to polylogarithmic factors, while in the multi-agent scenario, has linear speed-up. To conduct our analysis, we introduce a new measure of heterogeneity, which may hold independent theoretical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Data Stream Mining Techniques · Neural Networks and Applications
