Cross validation for model selection: a primer with examples from ecology
Luke Yates, Zach Aandahl, Shane A. Richards, and Barry W. Brook

TL;DR
This paper provides a comprehensive primer on cross-validation techniques for model selection in ecology, emphasizing practical application, technical considerations, and addressing misconceptions, with case studies illustrating its effectiveness.
Contribution
It offers detailed guidance on applying cross-validation in ecological research, including variants, technical nuances, and recommendations, bridging gaps between theory and practice.
Findings
CV is versatile and applicable even without likelihood or parameter counts.
Leave-one-out and approximate CV minimize bias effectively.
Calibrated model selection rules help prevent overfitting.
Abstract
The growing use of model-selection principles in ecology for statistical inference is underpinned by information criteria (IC) and cross-validation (CV) techniques. Although IC techniques, such as Akaike's Information Criterion, have been historically more popular in ecology, CV is a versatile and increasingly used alternative. CV uses data splitting to estimate model scores based on (out-of-sample) predictive performance, which can be used even when it is not possible to derive a likelihood (e.g., machine learning) or count parameters precisely (e.g., mixed-effects models and penalised regression). Here we provide a primer to understanding and applying CV in ecology. We review commonly applied variants of CV, including approximate methods, and make recommendations for their use based on the statistical context. We explain some important -- but often overlooked -- technical aspects of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpecies Distribution and Climate Change · Soil Geostatistics and Mapping · Data Analysis with R
