Testing Support Size More Efficiently Than Learning Histograms
Renato Ferreira Pinto Jr., Nathaniel Harms

TL;DR
This paper introduces a more sample-efficient method for testing the support size of a distribution than traditional histogram learning, using Chebyshev polynomial approximations.
Contribution
It presents a novel testing algorithm that outperforms histogram learning in support size testing, nearly matching theoretical lower bounds.
Findings
Testing requires fewer samples than learning histograms.
The new method nearly matches the known lower bounds for sample complexity.
Provides larger lower bounds on support size than previous methods.
Abstract
Consider two problems about an unknown probability distribution : 1. How many samples from are required to test if is supported on elements or not? Specifically, given samples from , determine whether it is supported on at most elements, or it is "-far" (in total variation distance) from being supported on elements. 2. Given samples from , what is the largest lower bound on its support size that we can produce? The best known upper bound for problem (1) uses a general algorithm for learning the histogram of the distribution , which requires samples. We show that testing can be done more efficiently than learning the histogram, using only samples, nearly matching the best known lower bound of . This algorithm also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOnline Learning and Analytics
