TL;DR
This study compares machine-learning and traditional models for ecological spatial data, emphasizing the importance of bias reduction and hyperparameter tuning, with results showing GAM and RF outperform others in predicting forest disease distribution.
Contribution
It introduces a comprehensive evaluation of spatial cross-validation methods and hyperparameter tuning for ecological modeling, highlighting the superiority of GAM and RF.
Findings
GAM and RF achieved the highest predictive accuracy with AUROC around 0.7.
Bias-reduced performance estimates are significantly lower than non-spatial estimates.
Spatial partitioning improves hyperparameter tuning for spatial data.
Abstract
Machine-learning algorithms have gained popularity in recent years in the field of ecological modeling due to their promising results in predictive performance of classification problems. While the application of such algorithms has been highly simplified in the last years due to their well-documented integration in commonly used statistical programming languages such as R, there are several practical challenges in the field of ecological modeling related to unbiased performance estimation, optimization of algorithms using hyperparameter tuning and spatial autocorrelation. We address these issues in the comparison of several widely used machine-learning algorithms such as Boosted Regression Trees (BRT), k-Nearest Neighbor (WKNN), Random Forest (RF) and Support Vector Machine (SVM) to traditional parametric algorithms such as logistic regression (GLM) and semi-parametric ones like…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsGeneralized additive models · Logistic Regression
