Streaming Algorithms for Support-Aware Histograms
Justin Y. Chen, Piotr Indyk, Tal Wagner

TL;DR
This paper introduces support-aware streaming algorithms for histograms that focus on approximation errors within the distribution's support, enabling more efficient and accurate data summaries in streaming contexts.
Contribution
It proposes novel support-aware error measures and develops near-optimal 1-pass and 2-pass streaming algorithms with sub-linear space complexity.
Findings
Support-aware error measure improves histogram approximation accuracy.
Exponential gap in space complexity between 1-pass and 2-pass algorithms.
Algorithms demonstrate effectiveness on real and synthetic datasets.
Abstract
Histograms, i.e., piece-wise constant approximations, are a popular tool used to represent data distributions. Traditionally, the difference between the histogram and the underlying distribution (i.e., the approximation error) is measured using the norm, which sums the differences between the two functions over all items in the domain. Although useful in many applications, the drawback of this error measure is that it treats approximation errors of all items in the same way, irrespective of whether the mass of an item is important for the downstream application that uses the approximation. As a result, even relatively simple distributions cannot be approximated by succinct histograms without incurring large error. In this paper, we address this issue by adapting the definition of approximation so that only the errors of the items that belong to the support of the distribution…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Data Stream Mining Techniques · Data Management and Algorithms
