Scalable subsampling: computation, aggregation and inference

Dimitris N. Politis

arXiv:2112.06434·math.ST·December 14, 2021

Scalable subsampling: computation, aggregation and inference

Dimitris N. Politis

PDF

Open Access

TL;DR

This paper introduces a scalable subsampling and subagging method for statistical inference that is computationally feasible with large datasets, providing effective distribution estimation and confidence interval construction.

Contribution

It proposes a non-random subsampling approach for scalable distribution estimation and subagging, improving computational efficiency in big data contexts.

Findings

01

Non-random subsamples enable effective distribution estimation.

02

Scalable subagging can match or outperform traditional estimators.

03

Method facilitates confidence interval construction in large datasets.

Abstract

Subsampling is a general statistical method developed in the 1990s aimed at estimating the sampling distribution of a statistic $\hat{θ}_{n}$ in order to conduct nonparametric inference such as the construction of confidence intervals and hypothesis tests. Subsampling has seen a resurgence in the Big Data era where the standard, full-resample size bootstrap can be infeasible to compute. Nevertheless, even choosing a single random subsample of size $b$ can be computationally challenging with both $b$ and the sample size $n$ being very large. In the paper at hand, we show how a set of appropriately chosen, non-random subsamples can be used to conduct effective -- and computationally feasible -- distribution estimation via subsampling. Further, we show how the same set of subsamples can be used to yield a procedure for subsampling aggregation -- also known as subagging -- that is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Advanced Statistical Process Monitoring · Statistical Methods and Inference