A new Frequency Estimation Sketch for Data Streams
Ning Li

TL;DR
This paper introduces a novel frequency estimation sketch tailored for high-speed data streams, aiming to improve accuracy and efficiency in real-time item frequency estimation across various applications.
Contribution
The paper proposes a new sketch data structure specifically designed for frequency estimation in data streams, enhancing existing probabilistic methods with improved accuracy and computational efficiency.
Findings
Demonstrates superior accuracy over existing sketches in experiments
Reduces memory usage while maintaining estimation precision
Applicable to diverse high-speed data stream scenarios
Abstract
In data stream applications, one of the critical issues is to estimate the frequency of each item in the specific multiset. The multiset means that each item in this set can appear multiple times. The data streams in many applications are high-speed streams which contain massive data, such as real-time IP traffic, graph streams, web clicks and crawls, sensor database, and natural language processing (NLP) [2][6], etc. In these applications, the stream information needs to be recorded by the servers in real time. However, since the data streams in these applications are high-speed, the accurate recording and estimation of item frequencies is always impractical. An alternative approach for addressing this problem is to estimate the item frequencies based on probabilistic data structures, and this approach has been widely used in the high-speed data streams estimation [7][9]. Sketches is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCaching and Content Delivery · Data Stream Mining Techniques · Web Data Mining and Analysis
