Better Bounds for the Distributed Experts Problem

David P. Woodruff; Samson Zhou

arXiv:2603.09168·cs.LG·March 11, 2026

Better Bounds for the Distributed Experts Problem

David P. Woodruff, Samson Zhou

PDF

Open Access 3 Reviews

TL;DR

This paper introduces a new protocol for the distributed experts problem that achieves lower regret bounds with efficient communication, improving upon previous methods in distributed online learning.

Contribution

The paper presents a novel protocol that reduces regret bounds and communication costs for the distributed experts problem, advancing the state-of-the-art in distributed online learning.

Findings

01

Achieves regret roughly proportional to 1 over square root of T with polylog factors.

02

Uses communication complexity of O((n + s)/R^2) times polylog factors.

03

Improves upon previous bounds in distributed experts problem.

Abstract

In this paper, we study the distributed experts problem, where $n$ experts are distributed across $s$ servers for $T$ timesteps. The loss of each expert at each time $t$ is the $ℓ_{p}$ norm of the vector that consists of the losses of the expert at each of the $s$ servers at time $t$ . The goal is to minimize the regret $R$ , i.e., the loss of the distributed protocol compared to the loss of the best expert, amortized over the all $T$ times, while using the minimum amount of communication. We give a protocol that achieves regret roughly $R ≳ \frac{1}{T \cdot poly l o g ( n s T )}$ , using $O (\frac{n}{R ^{2}} + \frac{s}{R ^{2}}) \cdot max (s^{1 - 2/ p}, 1) \cdot poly lo g (n s T)$ bits of communication, which improves on previous work.

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 8Confidence 4

Strengths

Strength #1: The paper is extremely well written, communicating complicated ideas in an incremental fashion. The main ideas of the theory are presented in a manner that allow the proofs to be checked easily. Strength #2: The paper improves upon previous work by a) generalizing to arbitrary p-norms, b) attaining the regret-sensitive rate s/R^2 on the dependency on the number of servers, and c) giving an algorithm that improves the dependence on number of servers by a factor max(s^{1-2/p}) Stre

Weaknesses

Weakness #1: The trick to trade-off communication with regret is somewhat artificial, requiring that all servers are silent on some rounds. While the theory works out, I wonder if there is a more natural algorithm that would yield further improvement. Weakness #2: This is largely a theory paper exploring communication-regret trade-offs in expert learning. It may be a little outside the primary areas of interest for ICLR.

Reviewer 02Rating 6Confidence 3

Strengths

- This is the first work to analyze distributed experts in the coordinator model for general $\ell_p$ losses. - The embedding of $\ell_p$ losses into $\ell_\infty$ through exponential random variables, combined with a geometric mean estimator for variance reduction, represents a technically sophisticated and creative approach. - The presentation of three successive algorithms (Algorithms 2--4), a warm-up, a parameterized version that achieves regret-communication tradeoff, and the fin

Weaknesses

- The authors state that it is information-theoretically impossible to achieve regret smaller than $O(1/\sqrt{T})$, but then compare their method to the algorithm of [1] for $R = O(1)$, claiming improved communication in that regime. This causes confusion as $R = O(1)$ is not achievable. Instead, taking $R \ge 1/\sqrt{T}$ seems to reproduce the communication cost of [1], showing no improvement compared to [1]. The claimed improvement in communication complexity should be either removed or clarif

Reviewer 03Rating 2Confidence 4

Strengths

I found the paper to be quite well motivated, since the problem feels motivated from both a theoretical and practical perspective. I think that the algorithmic ideas in the paper are nice and appear to be a natural approach for the problem.

Weaknesses

My main concern about the paper is that there are some important steps in the proof where I have doubts about the correctness of the argument. (1) This is probably my most important concern and is about the important lemma 3.3. The proof is quite sketchy. It is stated that a conditional expectation (l755) equality holds for all realizations of $p_t$. Realizations of what? I assume of $p(t)$, but what is $p(t)$? I assume the probability vector of playing each expert at time $t$ because then the

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsOptimization and Search Problems · Advanced Bandit Algorithms Research · Game Theory and Applications