Histograms and Wavelets on Probabilistic Data
Graham Cormode, Minos Garofalakis

TL;DR
This paper develops algorithms for creating histogram and wavelet summaries of probabilistic data, enabling efficient and accurate data approximation in uncertain databases, which improves query processing and data exploration.
Contribution
It introduces novel algorithms for probabilistic histograms and wavelets, extending dynamic programming techniques to optimize data summaries under uncertainty.
Findings
Algorithms outperform simple sampling methods in accuracy.
Proposed methods are computationally efficient.
Enhanced data summaries improve probabilistic query processing.
Abstract
There is a growing realization that uncertain information is a first-class citizen in modern database management. As such, we need techniques to correctly and efficiently process uncertain data in database systems. In particular, data reduction techniques that can produce concise, accurate synopses of large probabilistic relations are crucial. Similar to their deterministic relation counterparts, such compact probabilistic data synopses can form the foundation for human understanding and interactive data exploration, probabilistic query planning and optimization, and fast approximate query processing in probabilistic database systems. In this paper, we introduce definitions and algorithms for building histogram- and wavelet-based synopses on probabilistic data. The core problem is to choose a set of histogram bucket boundaries or wavelet coefficients to optimize the accuracy of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Advanced Database Systems and Queries · Data Stream Mining Techniques
