Is interpolation benign for random forest regression?
Ludovic Arnould (LPSM (UMR\_8001)), Claire Boyer (LPSM (UMR\_8001),, MOKAPLAN), Erwan Scornet (CMAP)

TL;DR
This paper investigates whether random forest regression models can interpolate training data without sacrificing predictive consistency, revealing conditions under which interpolation and consistency coexist.
Contribution
It provides the first theoretical analysis demonstrating that interpolating Random Forests can be consistent, emphasizing the role of feature randomization and averaging.
Findings
Breiman's RF are consistent when interpolating without bootstrap.
Interpolation area size converges quickly to zero, enabling consistency.
Adaptive RF can achieve both interpolation and consistency.
Abstract
Statistical wisdom suggests that very complex models, interpolating training data, will be poor at predicting unseen examples.Yet, this aphorism has been recently challenged by the identification of benign overfitting regimes, specially studied in the case of parametric models: generalization capabilities may be preserved despite model high complexity.While it is widely known that fully-grown decision trees interpolate and, in turn, have bad predictive performances, the same behavior is yet to be analyzed for Random Forests (RF).In this paper, we study the trade-off between interpolation and consistency for several types of RF algorithms. Theoretically, we prove that interpolation regimes and consistency cannot be achieved simultaneously for several non-adaptive RF.Since adaptivity seems to be the cornerstone to bring together interpolation and consistency, we study interpolating Median…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Statistical Methods and Inference · Machine Learning and Data Classification
