TL;DR
This paper introduces a methodology to define the area of applicability for spatial prediction models, helping to identify where predictions are reliable based on environmental similarity to training data.
Contribution
The paper proposes a dissimilarity index and threshold-based approach to delineate the area of applicability for spatial models, improving the assessment of prediction reliability.
Findings
Prediction error within the AOA aligns with cross-validation RMSE.
A threshold at the 0.95 quantile of DI effectively defines the AOA.
The approach works for both random and clustered training data.
Abstract
Predictive modelling using machine learning has become very popular for spatial mapping of the environment. Models are often applied to make predictions far beyond sampling locations where new geographic locations might considerably differ from the training data in their environmental properties. However, areas in the predictor space without support of training data are problematic. Since the model has no knowledge about these environments, predictions have to be considered uncertain. Estimating the area to which a prediction model can be reliably applied is required. Here, we suggest a methodology that delineates the "area of applicability" (AOA) that we define as the area, for which the cross-validation error of the model applies. We first propose a "dissimilarity index" (DI) that is based on the minimum distance to the training data in the predictor space, with predictors being…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
