Near-Optimal Bounds for Testing Histogram Distributions

Cl\'ement L. Canonne; Ilias Diakonikolas; Daniel M. Kane; and Sihan; Liu

arXiv:2207.06596·cs.DS·July 15, 2022·1 cites

Near-Optimal Bounds for Testing Histogram Distributions

Cl\'ement L. Canonne, Ilias Diakonikolas, Daniel M. Kane, and Sihan, Liu

PDF

Open Access

TL;DR

This paper presents a near-optimal, efficient algorithm for testing whether a distribution over an ordered domain is a histogram with a specified number of bins, along with matching lower bounds on sample complexity.

Contribution

It introduces a nearly-optimal, computationally efficient algorithm for histogram distribution testing and establishes tight sample complexity bounds.

Findings

01

Sample complexity is $ ilde{ heta}(rac{ oot n k}{ oot ext{efficiency}} + rac{k}{ ext{efficiency}^2} + rac{ oot n}{ ext{efficiency}^2})$

02

Algorithm is both near-optimal and computationally efficient

03

Provides nearly-matching lower bounds on sample complexity

Abstract

We investigate the problem of testing whether a discrete probability distribution over an ordered domain is a histogram on a specified number of bins. One of the most common tools for the succinct approximation of data, $k$ -histograms over $[n]$ , are probability distributions that are piecewise constant over a set of $k$ intervals. The histogram testing problem is the following: Given samples from an unknown distribution $p$ on $[n]$ , we want to distinguish between the cases that $p$ is a $k$ -histogram versus $ε$ -far from any $k$ -histogram, in total variation distance. Our main result is a sample near-optimal and computationally efficient algorithm for this testing problem, and a nearly-matching (within logarithmic factors) sample complexity lower bound. Specifically, we show that the histogram testing problem has sample complexity $\widetilde \Theta…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Algorithms and Data Compression · Complexity and Algorithms in Graphs