Technical report: Impact of evaluation metrics and sampling on the comparison of machine learning methods for biodiversity indicators prediction
Genevi\`eve Robin, Cathia Le Hasif

TL;DR
This paper examines how different evaluation metrics and sampling strategies impact the comparison of machine learning methods in biodiversity indicator prediction, revealing that choices significantly influence model rankings and interpretations.
Contribution
It provides an empirical analysis of the effects of evaluation metrics and sampling schemes on ML model comparison in biodiversity monitoring tasks.
Findings
Different evaluation metrics produce varying model rankings.
Sampling approaches significantly affect performance assessment.
Classical metrics like MSE may overlook subtle performance differences.
Abstract
Machine learning (ML) approaches are used more and more widely in biodiversity monitoring. In particular, an important application is the problem of predicting biodiversity indicators such as species abundance, species occurrence or species richness, based on predictor sets containing, e.g., climatic and anthropogenic factors. Considering the impressive number of different ML methods available in the litterature and the pace at which they are being published, it is crucial to develop uniform evaluation procedures, to allow the production of sound and fair empirical studies. However, defining fair evaluation procedures is challenging: because well-documented, intrinsic properties of biodiversity indicators such as their zero-inflation and over-dispersion, it is not trivial to design good sampling schemes for cross-validation nor good evaluation metrics. Indeed, the classical Mean Squared…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpecies Distribution and Climate Change · Data Analysis with R
