Random forests for binary geospatial data
Arkajyoti Saha, Abhirup Datta

TL;DR
This paper introduces RF-GP, a novel method combining random forests and Gaussian processes for non-linear regression of binary geospatial data, explicitly accounting for spatial dependence and outperforming existing methods.
Contribution
The paper develops RF-GP, a new approach that integrates RF-GLS with Gaussian process models for binary data, providing theoretical guarantees and improved performance.
Findings
RF-GP outperforms competing methods in simulations.
Theoretical consistency is established for RF-GP.
RF-GP effectively models spatial dependence in binary data.
Abstract
The manuscript develops new method and theory for non-linear regression for binary dependent data using random forests. Existing implementations of random forests for binary data cannot explicitly account for data correlation common in geospatial and time-series settings. For continuous outcomes, recent work has extended random forests (RF) to RF-GLS that incorporate spatial covariance using the generalized least squares (GLS) loss. However, adoption of this idea for binary data is challenging due to the use of the Gini impurity measure in classification trees, which has no known extension to model dependence. We show that for binary data, the GLS loss is also an extension of the Gini impurity measure, as the latter is exactly equivalent to the ordinary least squares (OLS) loss. This justifies using RF-GLS for non-parametric mean function estimation for binary dependent data. We then…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoil Geostatistics and Mapping · Geochemistry and Geologic Mapping · Data-Driven Disease Surveillance
MethodsGaussian Process
