WildWood: a new Random Forest algorithm

St\'ephane Ga\"iffas; Ibrahim Merad; Yiyang Yu

arXiv:2109.08010·cs.LG·June 14, 2023·1 cites

WildWood: a new Random Forest algorithm

St\'ephane Ga\"iffas, Ibrahim Merad, Yiyang Yu

PDF

Open Access 1 Repo

TL;DR

WildWood (WW) is a novel ensemble learning algorithm that enhances Random Forest predictions by aggregating all possible subtrees using exponential weights, resulting in faster and more accurate supervised learning.

Contribution

WildWood introduces an efficient aggregation method over all subtrees in Random Forests using exponential weights, improving prediction accuracy and computational speed.

Findings

01

WW achieves competitive accuracy with established ensemble methods.

02

WW's aggregation method improves prediction quality.

03

The algorithm is faster due to the histogram split strategy.

Abstract

We introduce WildWood (WW), a new ensemble algorithm for supervised learning of Random Forest (RF) type. While standard RF algorithms use bootstrap out-of-bag samples to compute out-of-bag scores, WW uses these samples to produce improved predictions given by an aggregation of the predictions of all possible subtrees of each fully grown tree in the forest. This is achieved by aggregation with exponential weights computed over out-of-bag samples, that are computed exactly and very efficiently thanks to an algorithm called context tree weighting. This improvement, combined with a histogram strategy to accelerate split finding, makes WW fast and competitive compared with other well-established ensemble methods, such as standard RF and extreme gradient boosting algorithms.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pyensemble/wildwood
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Anomaly Detection Techniques and Applications · Neural Networks and Applications