A Comparison of Five Probabilistic View-Size Estimation Techniques in OLAP
Kamel Aouiche, Daniel Lemire

TL;DR
This paper compares five probabilistic view-size estimation techniques for data warehouses, highlighting that some methods provide universally tight estimates and that Adaptive Counting remains efficient with increased memory.
Contribution
It introduces a comparative analysis of five unassuming probabilistic view-size estimation techniques, emphasizing the efficiency and accuracy of Adaptive Counting.
Findings
Generalized Counting, Gibbons-Tirthapura, and Adaptive Counting provide tight estimates.
Adaptive Counting remains fast with increased memory.
Some techniques have large errors due to statistical assumptions.
Abstract
A data warehouse cannot materialize all possible views, hence we must estimate quickly, accurately, and reliably the size of views to determine the best candidates for materialization. Many available techniques for view-size estimation make particular statistical assumptions and their error can be large. Comparatively, unassuming probabilistic techniques are slower, but they estimate accurately and reliability very large view sizes using little memory. We compare five unassuming hashing-based view-size estimation techniques including Stochastic Probabilistic Counting and LogLog Probabilistic Counting. Our experiments show that only Generalized Counting, Gibbons-Tirthapura, and Adaptive Counting provide universally tight estimates irrespective of the size of the view; of those, only Adaptive Counting remains constantly fast as we increase the memory budget.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Cloud Computing and Resource Management · Advanced Database Systems and Queries
