Estimating the Prediction Performance of Spatial Models via Spatial k-Fold Cross Validation
Jonne Pohjankukka, Tapio Pahikkala, Paavo Nevalainen, Jukka Heikkonen

TL;DR
This paper introduces spatial k-fold cross validation (SKCV), a method to accurately estimate the prediction performance of spatial models by accounting for spatial autocorrelation, which traditional methods tend to bias.
Contribution
The paper proposes SKCV, a modified cross validation technique that reduces bias caused by spatial autocorrelation in geographic data, improving performance estimates.
Findings
SKCV reduces optimistic bias by up to 40% compared to standard CV.
SKCV is effective for both regression and classification spatial models.
The method can guide data sampling density decisions in new research areas.
Abstract
In machine learning one often assumes the data are independent when evaluating model performance. However, this rarely holds in practise. Geographic information data sets are an example where the data points have stronger dependencies among each other the closer they are geographically. This phenomenon known as spatial autocorrelation (SAC) causes the standard cross validation (CV) methods to produce optimistically biased prediction performance estimates for spatial models, which can result in increased costs and accidents in practical applications. To overcome this problem we propose a modified version of the CV method called spatial k-fold cross validation (SKCV), which provides a useful estimate for model prediction performance without optimistic bias due to SAC. We test SKCV with three real world cases involving open natural data showing that the estimates produced by the ordinary…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
