Finding Subcube Heavy Hitters in Analytics Data Streams

Branislav Kveton; S. Muthukrishnan; Hoa T. Vu; Yikun Xian

arXiv:1708.05159·cs.DS·February 22, 2018

Finding Subcube Heavy Hitters in Analytics Data Streams

Branislav Kveton, S. Muthukrishnan, Hoa T. Vu, Yikun Xian

PDF

Open Access

TL;DR

This paper introduces efficient algorithms for identifying heavy hitters in high-dimensional data streams, leveraging Naive Bayes models to significantly reduce space complexity from quadratic to linear in the dimension.

Contribution

The paper presents a model-based approach that exploits Naive Bayes assumptions to develop a two-pass algorithm with reduced space complexity for subcube heavy hitters.

Findings

01

Achieves $ ilde{O}(kd/ ext{gamma})$ space complexity for the general problem.

02

Provides a two-pass $ ilde{O}(d/ ext{gamma})$-space algorithm under Naive Bayes assumptions.

03

Develops a fast method for all-query responses in $O(k/ ext{gamma}^2)$ time.

Abstract

Data streams typically have items of large number of dimensions. We study the fundamental heavy-hitters problem in this setting. Formally, the data stream consists of $d$ -dimensional items $x_{1}, \dots, x_{m} \in [n]^{d}$ . A $k$ -dimensional subcube $T$ is a subset of distinct coordinates ${T_{1}, \dots, T_{k}} \subseteq [d]$ . A subcube heavy hitter query $Query (T, v)$ , $v \in [n]^{k}$ , outputs YES if $f_{T} (v) \geq γ$ and NO if $f_{T} (v) < γ /4$ , where $f_{T}$ is the ratio of number of stream items whose coordinates $T$ have joint values $v$ . The all subcube heavy hitters query $AllQuery (T)$ outputs all joint values $v$ that return YES to $Query (T, v)$ . The one dimensional version of this problem where $d = 1$ was heavily studied in data stream theory, databases, networking and signal processing. The subcube heavy hitters problem is applicable in all these cases. We present a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Data Stream Mining Techniques · Algorithms and Data Compression