Targeting predictors in random forest regression
Daniel Borup, Bent Jesper Christensen, Nicolaj N{\o}rgaard M\"uhlbach,, Mikkel Slot Nielsen

TL;DR
This paper introduces a targeting method for random forest regression that improves predictive accuracy by controlling split placement along strong predictors, especially in high-dimensional sparse data settings.
Contribution
It proposes a novel targeting approach that enhances split selection in RF, balancing bias and variance, and demonstrates significant accuracy improvements in economic data applications.
Findings
Targeting increases individual tree strength.
Optimal targeting uses 10-30% of predictors.
Predictive accuracy improves by up to 13%.
Abstract
Random forest regression (RF) is an extremely popular tool for the analysis of high-dimensional data. Nonetheless, its benefits may be lessened in sparse settings due to weak predictors, and a pre-estimation dimension reduction (targeting) step is required. We show that proper targeting controls the probability of placing splits along strong predictors, thus providing an important complement to RF's feature sampling. This is supported by simulations using representative finite samples. Moreover, we quantify the immediate gain from targeting in terms of increased strength of individual trees. Macroeconomic and financial applications show that the bias-variance trade-off implied by targeting, due to increased correlation among trees in the forest, is balanced at a medium degree of targeting, selecting the best 10--30\% of commonly applied predictors. Improvements in predictive accuracy of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Financial Risk and Volatility Modeling · Market Dynamics and Volatility
