Using Exponential Histograms to Approximate the Quantiles of Heavy- and Light-Tailed Data
Philip T. Labo

TL;DR
This paper investigates the properties of exponential histograms for quantile estimation in streaming data, analyzing their size, accuracy, and gaps for heavy- and light-tailed distributions, providing a deeper understanding of their performance.
Contribution
The study offers a detailed analysis of exponential histograms' size, accuracy, occupancy, and gaps specifically for exponential and Pareto distributions, extending understanding of their behavior.
Findings
Size grows like log n and follows a Gumbel distribution.
Bounds on missing mass and final bin mass are established.
Largest gap size is approximated, revealing distribution-dependent behavior.
Abstract
Exponential histograms, with bins of the form , for , straightforwardly summarize the quantiles of streaming data sets (Masson et al. 2019). While they guarantee the relative accuracy of their estimates, they appear to use only values to summarize inputs. We study four aspects of exponential histograms -- size, accuracy, occupancy, and largest gap size -- when inputs are i.i.d. or i.i.d. , taking (or, ) to represent all light- (or, heavy-) tailed distributions. We show that, in these settings, size grows like and takes on a Gumbel distribution as grows large. We bound the missing mass to the right of the histogram and the mass of its final…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpectroscopy and Chemometric Analyses · Industrial Vision Systems and Defect Detection · Neural Networks and Applications
