Double-Hashing Algorithm for Frequency Estimation in Data Streams
Nikita Seleznev, Senthil Kumar, and C. Bayan Bruss

TL;DR
This paper introduces a double-hashing algorithm that enhances frequency estimation in data streams by optimizing hash table collisions, improving accuracy, and reducing complexity without relying on machine learning models.
Contribution
The paper presents a novel double-hashing approach that dynamically optimizes streaming algorithms for frequency estimation based on stream properties, improving accuracy and efficiency.
Findings
Improved frequency estimation accuracy on synthetic and real data.
Reduced hash collisions by separating heavy hitters.
Applicable to various data streams without additional models.
Abstract
Frequency estimation of elements is an important task for summarizing data streams and machine learning applications. The problem is often addressed by using streaming algorithms with sublinear space data structures. These algorithms allow processing of large data while using limited data storage. Commonly used streaming algorithms, such as count-min sketch, have many advantages, but do not take into account properties of a data stream for performance optimization. In the present paper we introduce a novel double-hashing algorithm that provides flexibility to optimize streaming algorithms depending on the properties of a given stream. In the double-hashing approach, first a standard streaming algorithm is employed to obtain an estimate of the element frequencies. This estimate is derived using a fraction of the stream and allows identification of the heavy hitters. Next, it uses a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCaching and Content Delivery · Data Stream Mining Techniques · Network Security and Intrusion Detection
