On Convergence Rate of the Generalized Diversity Subsampling Method

Boyang Shang

arXiv:2309.00636·stat.ME·September 6, 2023

On Convergence Rate of the Generalized Diversity Subsampling Method

Boyang Shang

PDF

Open Access

TL;DR

This paper analyzes the convergence rate of the generalized Diversity Subsampling (g-DS) method, focusing on how density estimation errors affect its asymptotic performance when selecting samples from large datasets.

Contribution

It provides a theoretical analysis of the pointwise convergence rate of g-DS considering density estimation bias and variance, extending previous asymptotic results.

Findings

01

Convergence rate depends on density estimation bias and variance.

02

Density estimation errors influence the asymptotic performance of g-DS.

03

Theoretical bounds on the convergence rate are established.

Abstract

arXiv:2206.10812v1 [stat.ME] proposes a useful algorithm, named generalized Diversity Subsampling (g-DS) algorithm, to select a subsample following some target probability distribution from a finite data set and demonstrates its effectiveness numerically. While the asymptotic performances of g-DS when the true data distribution is known was discussed in arXiv:2206.10812v1 [stat.ME], it remains an interesting question how the estimation errors in the density estimation step, which is an unavoidable step to use g-DS in real-world data sets, influences its asymptotic performance. In this paper, we study the pointwise convergence rate of probability density function (p.d.f) the g-DS subsample to the target p.d.f value, as the data set size approaches infinity, under consideration of the pointwise bias and variance of the estimated data p.d.f.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Survey Sampling and Estimation Techniques · Statistical Methods and Inference