Graphical Model Sketch
Branislav Kveton, Hung Bui, Mohammad Ghavamzadeh, Georgios, Theocharous, S. Muthukrishnan, and Siqi Sun

TL;DR
This paper introduces a novel method combining graphical models with count-min sketches to efficiently estimate probabilities in high-cardinality structured data streams, significantly improving accuracy and scalability.
Contribution
It proposes a new approach that uses graphical model structure and sketching techniques to estimate probabilities, with theoretical error bounds and practical improvements.
Findings
Error bounds are multiplicative and better than CM sketch.
Achieves order of magnitude improvements in probability estimation.
Space complexity is independent of variable cardinality.
Abstract
Structured high-cardinality data arises in many domains, and poses a major challenge for both modeling and inference. Graphical models are a popular approach to modeling structured data but they are unsuitable for high-cardinality variables. The count-min (CM) sketch is a popular approach to estimating probabilities in high-cardinality data but it does not scale well beyond a few variables. In this work, we bring together the ideas of graphical models and count sketches; and propose and analyze several approaches to estimating probabilities in structured high-cardinality streams of data. The key idea of our approximations is to use the structure of a graphical model and approximately estimate its factors by "sketches", which hash high-cardinality variables using random projections. Our approximations are computationally efficient and their space complexity is independent of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Advanced Database Systems and Queries · Machine Learning and Algorithms
