An RKHS Perspective on Tree Ensembles
Mehdi Dagdoug, Clement Dombry, Jean-Jil Duchamps

TL;DR
This paper introduces a novel RKHS framework for analyzing tree ensemble methods like Random Forests and Gradient Boosting, providing theoretical insights and practical tools for interpretability and variable importance.
Contribution
It develops a data-dependent RKHS perspective for tree ensembles, establishing their analytical properties and linking them to variational and gradient flow interpretations.
Findings
Random Forest kernel is bounded, continuous, and universal.
Ensemble predictions are minimizers of a penalized risk in the RKHS.
Introduces kernel PCA and GVI for interpretability and variable importance.
Abstract
Random Forests and Gradient Boosting are among the most effective algorithms for supervised learning on tabular data. Both belong to the class of tree-based ensemble methods, where predictions are obtained by aggregating many randomized regression trees. In this paper, we develop a theoretical framework for analyzing such methods through Reproducing Kernel Hilbert Spaces (RKHSs) constructed on tree ensembles -- more precisely, on the random partitions generated by randomized regression trees. We establish fundamental analytical properties of the resulting Random Forest kernel, including boundedness, continuity, and universality, and show that a Random Forest predictor can be characterized as the unique minimizer of a penalized empirical risk functional in this RKHS, providing a variational interpretation of ensemble learning. We further extend this perspective to the continuous-time…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace and Expression Recognition · Stochastic Gradient Optimization Techniques · Machine Learning and ELM
