Performance evaluation and hyperparameter tuning of statistical and   machine-learning models using spatial data

Patrick Schratz; Jannes Muenchow; Eugenia Iturritxa; Jakob Richter,; Alexander Brenning

arXiv:1803.11266·stat.ML·October 7, 2019

Performance evaluation and hyperparameter tuning of statistical and machine-learning models using spatial data

Patrick Schratz, Jannes Muenchow, Eugenia Iturritxa, Jakob Richter,, Alexander Brenning

PDF

1 Repo

TL;DR

This study compares machine-learning and traditional models for ecological spatial data, emphasizing the importance of bias reduction and hyperparameter tuning, with results showing GAM and RF outperform others in predicting forest disease distribution.

Contribution

It introduces a comprehensive evaluation of spatial cross-validation methods and hyperparameter tuning for ecological modeling, highlighting the superiority of GAM and RF.

Findings

01

GAM and RF achieved the highest predictive accuracy with AUROC around 0.7.

02

Bias-reduced performance estimates are significantly lower than non-spatial estimates.

03

Spatial partitioning improves hyperparameter tuning for spatial data.

Abstract

Machine-learning algorithms have gained popularity in recent years in the field of ecological modeling due to their promising results in predictive performance of classification problems. While the application of such algorithms has been highly simplified in the last years due to their well-documented integration in commonly used statistical programming languages such as R, there are several practical challenges in the field of ecological modeling related to unbiased performance estimation, optimization of algorithms using hyperparameter tuning and spatial autocorrelation. We address these issues in the comparison of several widely used machine-learning algorithms such as Boosted Regression Trees (BRT), k-Nearest Neighbor (WKNN), Random Forest (RF) and Support Vector Machine (SVM) to traditional parametric algorithms such as logistic regression (GLM) and semi-parametric ones like…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pat-s/pathogen-modeling
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsGeneralized additive models · Logistic Regression