Federated Learning for Heterogeneous Bandits with Unobserved Contexts
Jiabin Lin, Shana Moothedath

TL;DR
This paper introduces a federated learning algorithm for multi-arm bandits with unobserved, noisy contexts, enabling multiple agents to collaboratively learn optimal actions with theoretical regret guarantees.
Contribution
It proposes a novel elimination-based federated algorithm for bandits with unobserved contexts and provides regret analysis under linear reward assumptions.
Findings
The algorithm achieves sublinear regret bounds.
Numerical simulations demonstrate competitive performance.
Real-world dataset validation confirms practical effectiveness.
Abstract
We study the problem of federated stochastic multi-arm contextual bandits with unknown contexts, in which M agents are faced with different bandits and collaborate to learn. The communication model consists of a central server and the agents share their estimates with the central server periodically to learn to choose optimal actions in order to minimize the total regret. We assume that the exact contexts are not observable and the agents observe only a distribution of the contexts. Such a situation arises, for instance, when the context itself is a noisy measurement or based on a prediction mechanism. Our goal is to develop a distributed and federated algorithm that facilitates collaborative learning among the agents to select a sequence of optimal actions so as to maximize the cumulative reward. By performing a feature vector transformation, we propose an elimination-based algorithm…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Age of Information Optimization · Distributed Sensor Networks and Detection Algorithms
