A Characterization of Mean Squared Error for Estimator with Bagging

Martin Mihelich; Charles Dognin; Yan Shu; Michael Blot

arXiv:1908.02718·cs.LG·August 8, 2019·6 cites

A Characterization of Mean Squared Error for Estimator with Bagging

Martin Mihelich, Charles Dognin, Yan Shu, Michael Blot

PDF

Open Access

TL;DR

This paper provides a theoretical analysis of how bagging reduces the Mean Squared Error (MSE) for estimators, especially variance estimators, revealing conditions under which bagging improves or worsens performance.

Contribution

It proves that increasing the number of bagged estimators always reduces MSE and derives an exact MSE expression for variance estimators, highlighting the role of kurtosis.

Findings

01

Increasing bagged estimators N reduces MSE.

02

Bagging improves variance estimation only if kurtosis > 1.5.

03

Proposes a new algorithm for high-precision variance estimation.

Abstract

Bagging can significantly improve the generalization performance of unstable machine learning algorithms such as trees or neural networks. Though bagging is now widely used in practice and many empirical studies have explored its behavior, we still know little about the theoretical properties of bagged predictions. In this paper, we theoretically investigate how the bagging method can reduce the Mean Squared Error (MSE) when applied on a statistical estimator. First, we prove that for any estimator, increasing the number of bagged estimators $N$ in the average can only reduce the MSE. This intuitive result, observed empirically and discussed in the literature, has not yet been rigorously proved. Second, we focus on the standard estimator of variance called unbiased sample variance and we develop an exact analytical expression of the MSE for this estimator with bagging. This allows us…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference · Machine Learning and Data Classification · Gaussian Processes and Bayesian Inference