Near-Optimal Privacy-Preserving Learning for Max-Min Fair Multi-Agent Bandits

Amir Leshem

arXiv:2306.04498·cs.LG·May 5, 2026·1 cites

Near-Optimal Privacy-Preserving Learning for Max-Min Fair Multi-Agent Bandits

Amir Leshem

PDF

TL;DR

This paper introduces a distributed, privacy-preserving algorithm for max-min fair multi-agent bandit learning that achieves near-optimal regret with polynomial dependence on the number of agents and near-logarithmic dependence on the time horizon.

Contribution

It proposes a novel fully distributed algorithm that preserves reward privacy and improves upon previous methods by reducing the dependence on the number of agents from exponential to polynomial.

Findings

01

Achieves regret $O(N^3 f( ext{log } T) ext{log } T)$ with unknown reward support.

02

Maintains reward privacy by avoiding reward sharing among agents.

03

Simulation results confirm the theoretical scaling with horizon, agents, and gap.

Abstract

We study fair multi-agent multi-armed bandit learning under collision-only coordination. Agents cannot communicate explicitly during learning and observe only their own rewards and whether collisions occur when several agents access the same arm. The goal is to learn a max-min fair allocation while keeping each agent's reward samples and empirical reward estimates local. We propose a fully distributed algorithm for bounded rewards with unknown support, achieving regret $O (N^{3} f (lo g T) lo g T)$ , where $f$ is any nondecreasing diverging function satisfying $f (k - 1) / f (k) \to 1$ . The algorithm combines distributed agent ordering, cumulative round-robin exploration, endpoint-revalidated warm-started bisection, and a collision-based distributed auction for threshold-feasibility tests. Unlike leader-based optimal algorithms, no agent collects the reward observations, empirical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.