An Optimal Algorithm for l1-Heavy Hitters in Insertion Streams and Related Problems
Arnab Bhattacharyya, Palash Dey, David P. Woodruff

TL;DR
This paper presents an optimal algorithm for identifying $ ext{l}_1$-heavy hitters in insertion data streams, achieving tight space bounds and efficient processing, and extends to related frequency estimation problems.
Contribution
The authors develop the first optimal space and time complexity algorithm for $ ext{l}_1$-heavy hitters in insertion streams, with a lower bound proof confirming its optimality.
Findings
Uses $O(rac{1}{ ext{epsilon}} ext{log}rac{1}{ ext{phi}} + ext{phi}^{-1} ext{log} n + ext{log} ext{log} m)$ bits of space.
Processes each update in $O(1)$ worst-case time.
Can estimate maximum frequency within additive $ ext{epsilon} m$ error.
Abstract
We give the first optimal bounds for returning the -heavy hitters in a data stream of insertions, together with their approximate frequencies, closing a long line of work on this problem. For a stream of items in and parameters , let denote the frequency of item , i.e., the number of times item occurs in the stream. With arbitrarily large constant probability, our algorithm returns all items for which , returns no items for which , and returns approximations with for each item that it returns. Our algorithm uses bits of space, processes each stream update in worst-case time, and can report its output in time linear in the output size.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Complexity and Algorithms in Graphs · Algorithms and Data Compression
