An embarrassingly parallel optimal-space cardinality estimation algorithm
Emin Karayel

TL;DR
This paper introduces a parallelizable algorithm for cardinality estimation that matches the optimal space complexity of previous sequential solutions, enabling efficient distributed processing while simplifying implementation.
Contribution
It presents a new mergeable algorithm that retains optimal space bounds and improves on prior solutions by reducing complexity and resource requirements.
Findings
Achieves optimal space complexity in parallel and sequential settings.
Enables distributed processing through mergeability.
Simplifies implementation with fewer pseudo-random objects.
Abstract
In 2020 Blasiok (ACM Trans. Algorithms 16(2) 3:1-3:28) constructed an optimal space streaming algorithm for the cardinality estimation problem with the space complexity of where , and denote the relative accuracy, failure probability and universe size, respectively. However, his solution requires the stream to be processed sequentially. On the other hand, there are algorithms that admit a merge operation; they can be used in a distributed setting, allowing parallel processing of sections of the stream, and are highly relevant for large-scale distributed applications. The best-known such algorithm, unfortunately, has a space complexity exceeding . This work presents a new algorithm that improves on the solution by Blasiok, preserving its space…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
