Efficient Algorithm for Deterministic Search of Hot Elements
Dariusz R. Kowalski, Dominik Pajak

TL;DR
This paper introduces a new deterministic online algorithm for identifying frequent elements in large data streams, achieving optimal memory and time efficiency without randomness or multiple passes.
Contribution
It presents the first truly online deterministic algorithm for frequent element detection with near-optimal scalability and memory usage, improving over prior randomized or multi-pass methods.
Findings
Uses $O( ext{min}(n, rac{ ext{polylog}(n)}{ ext{epsilon}}))$ memory
Operates in $O( ext{polylog}(n))$ time per element
Establishes a lower bound of $ ext{Omega}( ext{min}(n, rac{1}{ ext{epsilon}}))$ on memory requirements
Abstract
When facing a very large stream of data, it is often desirable to extract most important statistics online in a short time and using small memory. For example, one may want to quickly find the most influential users generating posts online or check if the stream contains many identical elements. In this paper, we study streams containing insertions and deletions of elements from a possibly large set of size , that are being processed by online deterministic algorithms. At any point in the stream the algorithm may be queried to output elements of certain frequency in the already processed stream. More precisely, the most frequent elements in the stream so far. The output is considered correct if the returned elements it contains all elements with frequency greater than a given parameter and no element with frequency smaller than . We present an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Algorithms and Data Compression · Complexity and Algorithms in Graphs
