Near-Optimal Bounds for Testing Histogram Distributions
Cl\'ement L. Canonne, Ilias Diakonikolas, Daniel M. Kane, and Sihan, Liu

TL;DR
This paper presents a near-optimal, efficient algorithm for testing whether a distribution over an ordered domain is a histogram with a specified number of bins, along with matching lower bounds on sample complexity.
Contribution
It introduces a nearly-optimal, computationally efficient algorithm for histogram distribution testing and establishes tight sample complexity bounds.
Findings
Sample complexity is $ ilde{ heta}(rac{ oot n k}{ oot ext{efficiency}} + rac{k}{ ext{efficiency}^2} + rac{ oot n}{ ext{efficiency}^2})$
Algorithm is both near-optimal and computationally efficient
Provides nearly-matching lower bounds on sample complexity
Abstract
We investigate the problem of testing whether a discrete probability distribution over an ordered domain is a histogram on a specified number of bins. One of the most common tools for the succinct approximation of data, -histograms over , are probability distributions that are piecewise constant over a set of intervals. The histogram testing problem is the following: Given samples from an unknown distribution on , we want to distinguish between the cases that is a -histogram versus -far from any -histogram, in total variation distance. Our main result is a sample near-optimal and computationally efficient algorithm for this testing problem, and a nearly-matching (within logarithmic factors) sample complexity lower bound. Specifically, we show that the histogram testing problem has sample complexity $\widetilde \Theta…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Algorithms and Data Compression · Complexity and Algorithms in Graphs
