On Convergence Rate of the Generalized Diversity Subsampling Method
Boyang Shang

TL;DR
This paper analyzes the convergence rate of the generalized Diversity Subsampling (g-DS) method, focusing on how density estimation errors affect its asymptotic performance when selecting samples from large datasets.
Contribution
It provides a theoretical analysis of the pointwise convergence rate of g-DS considering density estimation bias and variance, extending previous asymptotic results.
Findings
Convergence rate depends on density estimation bias and variance.
Density estimation errors influence the asymptotic performance of g-DS.
Theoretical bounds on the convergence rate are established.
Abstract
arXiv:2206.10812v1 [stat.ME] proposes a useful algorithm, named generalized Diversity Subsampling (g-DS) algorithm, to select a subsample following some target probability distribution from a finite data set and demonstrates its effectiveness numerically. While the asymptotic performances of g-DS when the true data distribution is known was discussed in arXiv:2206.10812v1 [stat.ME], it remains an interesting question how the estimation errors in the density estimation step, which is an unavoidable step to use g-DS in real-world data sets, influences its asymptotic performance. In this paper, we study the pointwise convergence rate of probability density function (p.d.f) the g-DS subsample to the target p.d.f value, as the data set size approaches infinity, under consideration of the pointwise bias and variance of the estimated data p.d.f.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Survey Sampling and Estimation Techniques · Statistical Methods and Inference
