Revisiting Randomization in Greedy Model Search

Xin Chen; Jason M. Klusowski; Yan Shuo Tan; Chang Yu

arXiv:2506.15643·stat.ML·January 6, 2026

Revisiting Randomization in Greedy Model Search

Xin Chen, Jason M. Klusowski, Yan Shuo Tan, Chang Yu

PDF

Open Access

TL;DR

This paper analyzes how feature subsampling in greedy forward selection impacts bias and variance, revealing that it can improve model performance beyond mere variance reduction, especially under orthogonal design assumptions.

Contribution

It provides a theoretical analysis of feature subsampling effects on greedy model search, showing bias and variance reduction and characterizing the asymptotic reweighting of coefficients.

Findings

01

Ensembling with feature subsampling reduces both bias and variance.

02

Training error and degrees of freedom are non-monotonic in subsampling rate.

03

The estimator adaptively reweights coefficients based on their rank, approximated by a logistic function.

Abstract

Feature subsampling is a core component of random forests and other ensemble methods. While recent theory suggests that this randomization acts solely as a variance reduction mechanism analogous to ridge regularization, these results largely rely on base learners optimized via ordinary least squares. We investigate the effects of feature subsampling on greedy forward selection, a model that better captures the adaptive nature of decision trees. Assuming an orthogonal design, we prove that ensembling with feature subsampling can reduce both bias and variance, contrasting with the pure variance reduction of convex base learners. More precisely, we show that both the training error and degrees of freedom can be non-monotonic in the subsampling rate, breaking the analogy with standard shrinkage methods like the lasso or ridge regression. Furthermore, we characterize the exact asymptotic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Mining Algorithms and Applications · Data Management and Algorithms · Machine Learning and Data Classification