Foundation for unbiased cross-validation of spatio-temporal models for species distribution modeling
Diana Koldasbayeva, Alexey Zaytsev

TL;DR
This paper evaluates how different cross-validation strategies impact the assessment of species distribution models, emphasizing the importance of spatial and temporal considerations for realistic performance estimation.
Contribution
It introduces a benchmarking framework comparing multiple CV designs and training strategies, highlighting the importance of SAC-aware blocking for unbiased SDM evaluation.
Findings
Random CV overestimates model performance significantly.
Blocking at SAC range reduces bias in performance estimates.
Boosted ensemble models perform best with spatial CV.
Abstract
Evaluating the predictive performance of species distribution models (SDMs) under realistic deployment scenarios requires careful handling of spatial and temporal dependencies in the data. Cross-validation (CV) is the standard approach for model evaluation, but its design strongly influences the validity of performance estimates. When SDMs are intended for spatial or temporal transfer, random CV can lead to overoptimistic results due to spatial autocorrelation (SAC) among neighboring observations. We benchmark four machine learning algorithms (GBM, XGBoost, LightGBM, Random Forest) on two real-world presence-absence datasets, a temperate plant and an anadromous fish, using multiple CV designs: random, spatial, spatio-temporal, environmental, and forward-chaining. Two training data usage strategies (LAST FOLD and RETRAIN) are evaluated, with hyperparameter tuning performed within each…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpecies Distribution and Climate Change · Data Analysis with R
MethodsConvolution · 1x1 Convolution · Global Average Pooling · Average Pooling · ALIGN · Dilated Convolution · Switchable Atrous Convolution
