A Learning Framework for Self-Tuning Histograms
Raajay Viswanathan, Prateek Jain, Srivatsan Laxman, Arvind Arasu

TL;DR
This paper introduces a learning-based framework for self-tuning histograms that adapt to query workloads, proposing algorithms with formal guarantees and demonstrating significant error reduction over existing methods.
Contribution
It presents a general learning theoretic formulation for self-tuning histograms, introduces the EquiHist and SpHist algorithms, and provides theoretical analysis and empirical validation.
Findings
SpHist achieves up to 50% less error than ISOMER on real datasets.
EquiHist is competitive and scalable for equi-width histograms.
The framework supports multi-dimensional data and dynamic updates.
Abstract
In this paper, we consider the problem of estimating self-tuning histograms using query workloads. To this end, we propose a general learning theoretic formulation. Specifically, we use query feedback from a workload as training data to estimate a histogram with a small memory footprint that minimizes the expected error on future queries. Our formulation provides a framework in which different approaches can be studied and developed. We first study the simple class of equi-width histograms and present a learning algorithm, EquiHist, that is competitive in many settings. We also provide formal guarantees for equi-width histograms that highlight scenarios in which equi-width histograms can be expected to succeed or fail. We then go beyond equi-width histograms and present a novel learning algorithm, SpHist, for estimating general histograms. Here we use Haar wavelets to reduce the problem…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Machine Learning and Algorithms · Data Stream Mining Techniques
