Enhancing Histograms by Tree-Like Bucket Indices
Francesco Buccafurri, Gianluca Lax, Domenico Sacca', Luigi Pontieri, and Domenico Rosaci

TL;DR
This paper introduces a 4-level tree index to enhance frequency estimation within histogram buckets, significantly improving accuracy of query size estimations despite reduced bucket counts.
Contribution
It proposes a novel 4-level tree index for better inside-bucket frequency estimation, improving existing histogram techniques like MaxDiff and V-Optimal.
Findings
The 4-level tree index outperforms other methods in frequency estimation accuracy.
Adding the index improves the accuracy of MaxDiff and V-Optimal histograms.
The approach balances spatial cost with estimation precision.
Abstract
Histograms are used to summarize the contents of relations into a number of buckets for the estimation of query result sizes. Several techniques (e.g., MaxDiff and V-Optimal) have been proposed in the past for determining bucket boundaries which provide accurate estimations. However, while search strategies for optimal bucket boundaries are rather sophisticated, no much attention has been paid for estimating queries inside buckets and all of the above techniques adopt naive methods for such an estimation. This paper focuses on the problem of improving the estimation inside a bucket once its boundaries have been fixed. The proposed technique is based on the addition, to each bucket, of 32-bit additional information (organized into a 4-level tree index), storing approximate cumulative frequencies at 7 internal intervals of the bucket. Both theoretical analysis and experimental results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Data Management and Algorithms · Algorithms and Data Compression
