Optimal Lower Bound for Itemset Frequency Indicator Sketches
Eric Price

TL;DR
This paper establishes a tight lower bound on the space complexity for itemset frequency indicator sketches, matching known upper bounds and advancing understanding of data summarization efficiency.
Contribution
It improves the lower bound on the space needed for itemset frequency sketches, specifically for pairs, aligning with existing upper bounds.
Findings
Lower bound of Ω(1/ε * d * log(εd)) bits for k=2
Matching upper bounds for ε ≥ 1/d^{0.99}
Bounds are tight for small ε, including 1/d
Abstract
Given a database, a common problem is to find the pairs or -tuples of items that frequently co-occur. One specific problem is to create a small space "sketch" of the data that records which -tuples appear in more than an fraction of rows of the database. We improve the lower bound of Liberty, Mitzenmacher, and Thaler [LMT14], showing that bits are necessary even in the case of . This matches the sampling upper bound for all , and (in the case of ) another trivial upper bound for .
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Distributed Sensor Networks and Detection Algorithms · Statistical Methods and Inference
