Making Sense of Random Forest Probabilities: a Kernel Perspective
Matthew A. Olson, Abraham J. Wyner

TL;DR
This paper links random forest probability estimation to kernel regression, providing a statistically grounded approach and offering insights for tuning to improve probability accuracy.
Contribution
It establishes a kernel perspective on random forest probabilities, connecting them to kernel regression and guiding better tuning practices.
Findings
Random forests can be interpreted through a proximity kernel lens.
The kernel perspective clarifies the geometry and sparsity in probability estimation.
Recommendations for tuning random forests to enhance probability estimates.
Abstract
A random forest is a popular tool for estimating probabilities in machine learning classification tasks. However, the means by which this is accomplished is unprincipled: one simply counts the fraction of trees in a forest that vote for a certain class. In this paper, we forge a connection between random forests and kernel regression. This places random forest probability estimation on more sound statistical footing. As part of our investigation, we develop a model for the proximity kernel and relate it to the geometry and sparsity of the estimation problem. We also provide intuition and recommendations for tuning a random forest to improve its probability estimates.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Machine Learning and Data Classification · Face and Expression Recognition
