Near-Optimal Privacy-Preserving Learning for Max-Min Fair Multi-Agent Bandits
Amir Leshem

TL;DR
This paper introduces a distributed, privacy-preserving algorithm for max-min fair multi-agent bandit learning that achieves near-optimal regret with polynomial dependence on the number of agents and near-logarithmic dependence on the time horizon.
Contribution
It proposes a novel fully distributed algorithm that preserves reward privacy and improves upon previous methods by reducing the dependence on the number of agents from exponential to polynomial.
Findings
Achieves regret $O(N^3 f( ext{log } T) ext{log } T)$ with unknown reward support.
Maintains reward privacy by avoiding reward sharing among agents.
Simulation results confirm the theoretical scaling with horizon, agents, and gap.
Abstract
We study fair multi-agent multi-armed bandit learning under collision-only coordination. Agents cannot communicate explicitly during learning and observe only their own rewards and whether collisions occur when several agents access the same arm. The goal is to learn a max-min fair allocation while keeping each agent's reward samples and empirical reward estimates local. We propose a fully distributed algorithm for bounded rewards with unknown support, achieving regret , where is any nondecreasing diverging function satisfying . The algorithm combines distributed agent ordering, cumulative round-robin exploration, endpoint-revalidated warm-started bisection, and a collision-based distributed auction for threshold-feasibility tests. Unlike leader-based optimal algorithms, no agent collects the reward observations, empirical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
