Node harvest

Nicolai Meinshausen

arXiv:0910.2145·stat.ML·January 10, 2011

Node harvest

Nicolai Meinshausen

PDF

TL;DR

Node harvest is a method that combines the interpretability of single trees with the predictive power of ensembles by selecting a sparse set of nodes through quadratic programming, achieving competitive accuracy especially on low signal-to-noise data.

Contribution

It introduces a simple, interpretable, and sparse ensemble method that automatically selects relevant nodes without tuning parameters, balancing interpretability and accuracy.

Findings

01

Achieves high predictive accuracy on various datasets.

02

Produces sparse, interpretable models with few selected nodes.

03

Handles mixed data types and missing values effectively.

Abstract

When choosing a suitable technique for regression and classification with multivariate predictor variables, one is often faced with a tradeoff between interpretability and high predictive accuracy. To give a classical example, classification and regression trees are easy to understand and interpret. Tree ensembles like Random Forests provide usually more accurate predictions. Yet tree ensembles are also more difficult to analyze than single trees and are often criticized, perhaps unfairly, as `black box' predictors. Node harvest is trying to reconcile the two aims of interpretability and predictive accuracy by combining positive aspects of trees and tree ensembles. Results are very sparse and interpretable and predictive accuracy is extremely competitive, especially for low signal-to-noise data. The procedure is simple: an initial set of a few thousand nodes is generated randomly. If a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.