Distinct Elements in Streams: An Algorithm for the (Text) Book
Sourav Chakraborty, N.V. Vinodchandran, Kuldeep S. Meel

TL;DR
This paper introduces a simple, accessible, sampling-based algorithm for estimating the number of distinct elements in a data stream, making the topic more approachable for undergraduates.
Contribution
The paper presents a novel, easy-to-understand algorithm for the Distinct Elements problem that does not rely on complex hash functions, suitable for educational purposes.
Findings
Algorithm is space-efficient and simple to implement.
Accessible to undergraduates with basic probability knowledge.
Provides accurate estimates with theoretical guarantees.
Abstract
Given a data stream of elements where each , the Distinct Elements problem is to estimate the number of distinct elements in .Distinct Elements has been a subject of theoretical and empirical investigations over the past four decades resulting in space optimal algorithms for it.All the current state-of-the-art algorithms are, however, beyond the reach of an undergraduate textbook owing to their reliance on the usage of notions such as pairwise independence and universal hash functions. We present a simple, intuitive, sampling-based space-efficient algorithm whose description and the proof are accessible to undergraduates with the knowledge of basic probability theory.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
