Federated UCBVI: Communication-Efficient Federated Regret Minimization   with Heterogeneous Agents

Safwan Labbi; Daniil Tiapkin; Lorenzo Mancini; Paul Mangold; Eric; Moulines

arXiv:2410.22908·cs.LG·October 31, 2024

Federated UCBVI: Communication-Efficient Federated Regret Minimization with Heterogeneous Agents

Safwan Labbi, Daniil Tiapkin, Lorenzo Mancini, Paul Mangold, Eric, Moulines

PDF

Open Access

TL;DR

This paper introduces Fed-UCBVI, a communication-efficient federated reinforcement learning algorithm that achieves near-optimal regret bounds and scales linearly with the number of agents, even with heterogeneous environments.

Contribution

We propose Fed-UCBVI, a novel federated UCB-based algorithm with theoretical regret guarantees and minimal communication overhead in heterogeneous multi-agent settings.

Findings

01

Regret scales as ext{O}(\u007B ext{sqrt}(H^3 | ext{S}| | ext{A}| T / M))

02

Achieves minimax lower bounds in single-agent case

03

Linearly scales with the number of agents in multi-agent setting

Abstract

In this paper, we present the Federated Upper Confidence Bound Value Iteration algorithm ( $Fed-UCBVI$ ), a novel extension of the $UCBVI$ algorithm (Azar et al., 2017) tailored for the federated learning framework. We prove that the regret of $Fed-UCBVI$ scales as $\tilde{O} (H^{3} ∣ S ∣∣ A ∣ T / M)$ , with a small additional term due to heterogeneity, where $∣ S ∣$ is the number of states, $∣ A ∣$ is the number of actions, $H$ is the episode length, $M$ is the number of agents, and $T$ is the number of episodes. Notably, in the single-agent setting, this upper bound matches the minimax lower bound up to polylogarithmic factors, while in the multi-agent scenario, $Fed-UCBVI$ has linear speed-up. To conduct our analysis, we introduce a new measure of heterogeneity, which may hold independent theoretical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Data Stream Mining Techniques · Neural Networks and Applications