Random Forests for dependent data
Arkajyoti Saha, Sumanta Basu, Abhirup Datta

TL;DR
This paper introduces RF-GLS, an extension of random forests designed for dependent data like time series and spatial data, improving estimation and prediction accuracy by accounting for error dependence.
Contribution
The paper proposes RF-GLS, a novel random forest extension that incorporates dependence structures, and proves its consistency under various dependent error processes.
Findings
RF-GLS outperforms standard RF in dependent data scenarios.
RF-GLS is consistent under beta-mixing error processes.
First proof of RF consistency under dependence.
Abstract
Random forest (RF) is one of the most popular methods for estimating regression functions. The local nature of the RF algorithm, based on intra-node means and variances, is ideal when errors are i.i.d. For dependent error processes like time series and spatial settings where data in all the nodes will be correlated, operating locally ignores this dependence. Also, RF will involve resampling of correlated data, violating the principles of bootstrap. Theoretically, consistency of RF has been established for i.i.d. errors, but little is known about the case of dependent errors. We propose RF-GLS, a novel extension of RF for dependent error processes in the same way Generalized Least Squares (GLS) fundamentally extends Ordinary Least Squares (OLS) for linear models under dependence. The key to this extension is the equivalent representation of the local decision-making in a regression…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Statistical Methods and Models · Spectroscopy and Chemometric Analyses · Gaussian Processes and Bayesian Inference
MethodsGaussian Process
