Estimating a sharp convergence bound for randomized ensembles
Miles E. Lopes

TL;DR
This paper derives a sharp theoretical upper bound on the convergence of prediction error in randomized ensemble classifiers like bagging and random forests, and proposes an estimator for this bound with optimal non-parametric rates.
Contribution
It introduces a precise upper bound on the algorithmic variance of ensemble classifiers and develops an estimator for this bound with optimal convergence properties.
Findings
The upper bound on variance is sharp and attainable.
The proposed estimator achieves optimal non-parametric MSE rates.
The work extends understanding of ensemble prediction error convergence.
Abstract
When randomized ensembles such as bagging or random forests are used for binary classification, the prediction error of the ensemble tends to decrease and stabilize as the number of classifiers increases. However, the precise relationship between prediction error and ensemble size is unknown in practice. In the standard case when classifiers are aggregated by majority vote, the present work offers a way to quantify this convergence in terms of "algorithmic variance," i.e. the variance of prediction error due only to the randomized training algorithm. Specifically, we study a theoretical upper bound on this variance, and show that it is sharp --- in the sense that it is attained by a specific family of randomized classifiers. Next, we address the problem of estimating the unknown value of the bound, which leads to a unique twist on the classical problem of non-parametric density…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Machine Learning and Algorithms · Bayesian Methods and Mixture Models
