Testing Support Size More Efficiently Than Learning Histograms

Renato Ferreira Pinto Jr.; Nathaniel Harms

arXiv:2410.18915·cs.DS·May 21, 2026

Testing Support Size More Efficiently Than Learning Histograms

Renato Ferreira Pinto Jr., Nathaniel Harms

PDF

TL;DR

This paper introduces a more sample-efficient method for testing the support size of a distribution than traditional histogram learning, using Chebyshev polynomial approximations.

Contribution

It presents a novel testing algorithm that outperforms histogram learning in support size testing, nearly matching theoretical lower bounds.

Findings

01

Testing requires fewer samples than learning histograms.

02

The new method nearly matches the known lower bounds for sample complexity.

03

Provides larger lower bounds on support size than previous methods.

Abstract

Consider two problems about an unknown probability distribution $p$ : 1. How many samples from $p$ are required to test if $p$ is supported on $n$ elements or not? Specifically, given samples from $p$ , determine whether it is supported on at most $n$ elements, or it is " $ϵ$ -far" (in total variation distance) from being supported on $n$ elements. 2. Given $m$ samples from $p$ , what is the largest lower bound on its support size that we can produce? The best known upper bound for problem (1) uses a general algorithm for learning the histogram of the distribution $p$ , which requires $Θ (\frac{n}{ϵ ^{2} l o g n})$ samples. We show that testing can be done more efficiently than learning the histogram, using only $O (\frac{n}{ϵ l o g n} lo g (1/ ϵ))$ samples, nearly matching the best known lower bound of $Ω (\frac{n}{ϵ l o g n})$ . This algorithm also…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsOnline Learning and Analytics