Stability via resampling: statistical problems beyond the real line

Jake A. Soloff; Rina Foygel Barber; Rebecca Willett

arXiv:2405.09511·math.ST·May 28, 2024

Stability via resampling: statistical problems beyond the real line

Jake A. Soloff, Rina Foygel Barber, Rebecca Willett

PDF

Open Access

TL;DR

This paper establishes a finite-sample theoretical guarantee for the stability of bagging with resampling methods across general output spaces, supported by empirical evaluations in various statistical contexts.

Contribution

It extends stability guarantees of bagging to algorithms with outputs beyond real numbers, covering a broad range of statistical applications.

Findings

01

Bagging is stable in general output spaces under the proposed theoretical conditions.

02

Empirical results confirm stability across diverse data types and problem settings.

03

Resampling methods improve the robustness of statistical algorithms in practice.

Abstract

Model averaging techniques based on resampling methods (such as bootstrapping or subsampling) have been utilized across many areas of statistics, often with the explicit goal of promoting stability in the resulting output. We provide a general, finite-sample theoretical result guaranteeing the stability of bagging when applied to algorithms that return outputs in a general space, so that the output is not necessarily a real-valued -- for example, an algorithm that estimates a vector of weights or a density function. We empirically assess the stability of bagging on synthetic and real-world data for a range of problem settings, including causal inference, nonparametric regression, and Bayesian model selection.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference · Advanced Statistical Process Monitoring