Estimating a sharp convergence bound for randomized ensembles

Miles E. Lopes

arXiv:1303.0727·math.PR·May 1, 2019

Estimating a sharp convergence bound for randomized ensembles

Miles E. Lopes

PDF

Open Access

TL;DR

This paper derives a sharp theoretical upper bound on the convergence of prediction error in randomized ensemble classifiers like bagging and random forests, and proposes an estimator for this bound with optimal non-parametric rates.

Contribution

It introduces a precise upper bound on the algorithmic variance of ensemble classifiers and develops an estimator for this bound with optimal convergence properties.

Findings

01

The upper bound on variance is sharp and attainable.

02

The proposed estimator achieves optimal non-parametric MSE rates.

03

The work extends understanding of ensemble prediction error convergence.

Abstract

When randomized ensembles such as bagging or random forests are used for binary classification, the prediction error of the ensemble tends to decrease and stabilize as the number of classifiers increases. However, the precise relationship between prediction error and ensemble size is unknown in practice. In the standard case when classifiers are aggregated by majority vote, the present work offers a way to quantify this convergence in terms of "algorithmic variance," i.e. the variance of prediction error due only to the randomized training algorithm. Specifically, we study a theoretical upper bound on this variance, and show that it is sharp --- in the sense that it is attained by a specific family of randomized classifiers. Next, we address the problem of estimating the unknown value of the bound, which leads to a unique twist on the classical problem of non-parametric density…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference · Machine Learning and Algorithms · Bayesian Methods and Mixture Models