The Essential Histogram
Housen Li, Axel Munk, Hannes Sieling, Guenther Walther

TL;DR
The paper introduces the essential histogram, a statistically justified, minimal-bin histogram that optimally estimates distribution features and provides a clear, reliable data visualization.
Contribution
It develops a confidence set for distribution functions and defines the essential histogram as the simplest histogram within this set, improving data visualization and analysis.
Findings
Provides a fast algorithm for constructing the essential histogram.
Demonstrates the method's effectiveness with real data examples.
Offers an R-package for practical implementation.
Abstract
The histogram is widely used as a simple, exploratory display of data, but it is usually not clear how to choose the number and size of bins. We construct a confidence set of distribution functions that optimally address the two main tasks of the histogram: estimating probabilities and detecting features such as increases and modes in the distribution. We define the essential histogram as the histogram in the confidence set with the fewest bins. Thus the essential histogram is the simplest visualization of the data that optimally achieves the main tasks of the histogram. The only assumption we make is that the data are independent and identically distributed. We provide a fast algorithm for the essential histogram, and illustrate our methodology with examples. An R-package is available on CRAN.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
