Optimal trees selection for classification via out-of-bag assessment and   sub-bagging

Zardad Khan; Naz Gul; Nosheen Faiz; Asma Gul; Werner Adler; Berthold; Lausen

arXiv:2012.15301·stat.ML·January 1, 2021

Optimal trees selection for classification via out-of-bag assessment and sub-bagging

Zardad Khan, Naz Gul, Nosheen Faiz, Asma Gul, Werner Adler, Berthold, Lausen

PDF

TL;DR

This paper proposes modified tree selection methods for optimal trees ensemble that utilize out-of-bag assessments and sub-bagging, leading to improved predictive performance across multiple datasets.

Contribution

It introduces two novel tree selection approaches that address internal validation loss in OTE, enhancing ensemble accuracy.

Findings

01

Modified methods outperform traditional OTE.

02

Improved ensemble accuracy on benchmark datasets.

03

Effective use of out-of-bag and sub-bagging techniques.

Abstract

The effect of training data size on machine learning methods has been well investigated over the past two decades. The predictive performance of tree based machine learning methods, in general, improves with a decreasing rate as the size of training data increases. We investigate this in optimal trees ensemble (OTE) where the method fails to learn from some of the training observations due to internal validation. Modified tree selection methods are thus proposed for OTE to cater for the loss of training observations in internal validation. In the first method, corresponding out-of-bag (OOB) observations are used in both individual and collective performance assessment for each tree. Trees are ranked based on their individual performance on the OOB observations. A certain number of top ranked trees is selected and starting from the most accurate tree, subsequent trees are added one by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.