Finding Subcube Heavy Hitters in Analytics Data Streams
Branislav Kveton, S. Muthukrishnan, Hoa T. Vu, Yikun Xian

TL;DR
This paper introduces efficient algorithms for identifying heavy hitters in high-dimensional data streams, leveraging Naive Bayes models to significantly reduce space complexity from quadratic to linear in the dimension.
Contribution
The paper presents a model-based approach that exploits Naive Bayes assumptions to develop a two-pass algorithm with reduced space complexity for subcube heavy hitters.
Findings
Achieves $ ilde{O}(kd/ ext{gamma})$ space complexity for the general problem.
Provides a two-pass $ ilde{O}(d/ ext{gamma})$-space algorithm under Naive Bayes assumptions.
Develops a fast method for all-query responses in $O(k/ ext{gamma}^2)$ time.
Abstract
Data streams typically have items of large number of dimensions. We study the fundamental heavy-hitters problem in this setting. Formally, the data stream consists of -dimensional items . A -dimensional subcube is a subset of distinct coordinates . A subcube heavy hitter query , , outputs YES if and NO if , where is the ratio of number of stream items whose coordinates have joint values . The all subcube heavy hitters query outputs all joint values that return YES to . The one dimensional version of this problem where was heavily studied in data stream theory, databases, networking and signal processing. The subcube heavy hitters problem is applicable in all these cases. We present a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Data Stream Mining Techniques · Algorithms and Data Compression
