Cross-Validated Variable Selection in Tree-Based Methods Improves   Predictive Performance

Amichai Painsky; Saharon Rosset

arXiv:1512.03444·stat.ML·December 14, 2015·IEEE Trans. Pattern Anal. Mach. Intell.

Cross-Validated Variable Selection in Tree-Based Methods Improves Predictive Performance

Amichai Painsky, Saharon Rosset

PDF

TL;DR

This paper introduces a cross-validation based variable selection method for tree models that enhances predictive accuracy and effectively utilizes categorical variables with many categories, addressing a key limitation of traditional tree algorithms.

Contribution

It proposes a novel LOO cross-validation approach for splitting variable selection in trees, improving performance and handling high-category categorical variables.

Findings

01

Significant performance improvements in tree and ensemble models.

02

Effective utilization of categorical variables with many categories.

03

Comparable computational complexity to CART for classification tasks.

Abstract

Recursive partitioning approaches producing tree-like models are a long standing staple of predictive modeling, in the last decade mostly as ``sub-learners'' within state of the art ensemble methods like Boosting and Random Forest. However, a fundamental flaw in the partitioning (or splitting) rule of commonly used tree building methods precludes them from treating different types of variables equally. This most clearly manifests in these methods' inability to properly utilize categorical variables with a large number of categories, which are ubiquitous in the new age of big data. Such variables can often be very informative, but current tree methods essentially leave us a choice of either not using them, or exposing our models to severe overfitting. We propose a conceptual framework to splitting using leave-one-out (LOO) cross validation for selecting the splitting variable, then…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.