On the Current State of Research in Explaining Ensemble Performance   Using Margins

Waldyn Martinez; J. Brian Gray

arXiv:1906.03123·stat.ML·June 10, 2019·1 cites

On the Current State of Research in Explaining Ensemble Performance Using Margins

Waldyn Martinez, J. Brian Gray

PDF

Open Access

TL;DR

This paper reviews and empirically tests current theories on how margins influence ensemble classifier performance, evaluating whether larger margins correlate with better generalization across various datasets.

Contribution

It introduces new techniques to analyze margin-based explanations and empirically assesses their validity using experiments on real and simulated data.

Findings

01

Larger margins tend to correlate with lower generalization error.

02

Increasing mean and decreasing variance of margins can improve ensemble performance.

03

Empirical results support some theoretical bounds but also highlight limitations.

Abstract

Empirical evidence shows that ensembles, such as bagging, boosting, random and rotation forests, generally perform better in terms of their generalization error than individual classifiers. To explain this performance, Schapire et al. (1998) developed an upper bound on the generalization error of an ensemble based on the margins of the training data, from which it was concluded that larger margins should lead to lower generalization error, everything else being equal. Many other researchers have backed this assumption and presented tighter bounds on the generalization error based on either the margins or functions of the margins. For instance, Shen and Li (2010) provide evidence suggesting that the generalization error of a voting classifier might be reduced by increasing the mean and decreasing the variance of the margins. In this article we propose several techniques and empirically…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Imbalanced Data Classification Techniques · Neural Networks and Applications