Sample complexity of the distinct elements problem
Yihong Wu, Pengkun Yang

TL;DR
This paper introduces an estimator for the distinct elements problem that achieves near-optimal sample complexity using polynomial approximation, with efficient computation and applicability to both sampling with and without replacement.
Contribution
It proposes a new estimator based on polynomial approximation that attains optimal sample complexity within logarithmic factors and provides a sharp bound on Vandermonde matrices.
Findings
Achieves near-optimal sample complexity for the distinct elements problem.
Provides an efficient $O(n)$ time estimator.
Extends results to sampling without replacement with small sample sizes.
Abstract
We consider the distinct elements problem, where the goal is to estimate the number of distinct colors in an urn containing balls based on samples drawn with replacements. Based on discrete polynomial approximation and interpolation, we propose an estimator with additive error guarantee that achieves the optimal sample complexity within factors, and in fact within constant factors for most cases. The estimator can be computed in time for an accurate estimation. The result also applies to sampling without replacement provided the sample size is a vanishing fraction of the urn size. One of the key auxiliary results is a sharp bound on the minimum singular values of a real rectangular Vandermonde matrix, which might be of independent interest.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
