Fast and Accurate Mining of Correlated Heavy Hitters
Italo Epicoco, Massimo Cafaro, Marco Pulimeno

TL;DR
This paper introduces a new counter-based algorithm for mining correlated heavy hitters in data streams, demonstrating improved accuracy, speed, and space efficiency over previous methods through theoretical proofs and extensive experiments.
Contribution
The paper presents a novel counter-based algorithm for correlated heavy hitter mining, with proven error bounds and superior performance compared to existing algorithms.
Findings
Outperforms Misra--Gries based algorithm in accuracy and speed
Requires significantly less space
Proven correctness and error bounds
Abstract
The problem of mining Correlated Heavy Hitters (CHH) from a two-dimensional data stream has been introduced recently, and a deterministic algorithm based on the use of the Misra--Gries algorithm has been proposed by Lahiri et al. to solve it. In this paper we present a new counter-based algorithm for tracking CHHs, formally prove its error bounds and correctness and show, through extensive experimental results, that our algorithm outperforms the Misra--Gries based algorithm with regard to accuracy and speed whilst requiring asymptotically much less space.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Mining Algorithms and Applications · Algorithms and Data Compression · Data Management and Algorithms
