On Fine-Grained Distinct Element Estimation
Ilias Diakonikolas, Daniel M. Kane, Jasper C.H. Lee, Thanasis Pittas, David P. Woodruff, Samson Zhou

TL;DR
This paper introduces a new parameterization for distributed distinct element estimation, leading to more efficient algorithms that outperform previous bounds when the number of collisions is small, and provides matching lower bounds.
Contribution
The authors propose a collision-based parameterization and develop algorithms with improved communication complexity, establishing $C$ as a tight measure and extending to streaming scenarios.
Findings
New collision-based parameterization improves efficiency when collisions are few.
Algorithms match lower bounds across different regimes, confirming tightness.
Extensions to streaming algorithms for high-frequency items.
Abstract
We study the problem of distributed distinct element estimation, where servers each receive a subset of a universe and aim to compute a -approximation to the number of distinct elements using minimal communication. While prior work establishes a worst-case bound of bits, these results rely on assumptions that may not hold in practice. We introduce a new parameterization based on the number of pairwise collisions, i.e., instances where the same element appears on multiple servers, and design a protocol that uses only bits, breaking previous lower bounds when is small. We further improve our algorithm under assumptions on the number of distinct elements or collisions and provide…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsDistributed systems and fault tolerance · Complexity and Algorithms in Graphs · Advanced Database Systems and Queries
