
TL;DR
Node harvest is a method that combines the interpretability of single trees with the predictive power of ensembles by selecting a sparse set of nodes through quadratic programming, achieving competitive accuracy especially on low signal-to-noise data.
Contribution
It introduces a simple, interpretable, and sparse ensemble method that automatically selects relevant nodes without tuning parameters, balancing interpretability and accuracy.
Findings
Achieves high predictive accuracy on various datasets.
Produces sparse, interpretable models with few selected nodes.
Handles mixed data types and missing values effectively.
Abstract
When choosing a suitable technique for regression and classification with multivariate predictor variables, one is often faced with a tradeoff between interpretability and high predictive accuracy. To give a classical example, classification and regression trees are easy to understand and interpret. Tree ensembles like Random Forests provide usually more accurate predictions. Yet tree ensembles are also more difficult to analyze than single trees and are often criticized, perhaps unfairly, as `black box' predictors. Node harvest is trying to reconcile the two aims of interpretability and predictive accuracy by combining positive aspects of trees and tree ensembles. Results are very sparse and interpretable and predictive accuracy is extremely competitive, especially for low signal-to-noise data. The procedure is simple: an initial set of a few thousand nodes is generated randomly. If a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
