# Do Random Forest-Driven Climate Envelope Models Require Variable Selection? A Case Study on Crustulina guttata (Theridiidae: Araneae)

**Authors:** Tae-Sung Kwon, Won Il Choi, Min-Jung Kim

PMC · DOI: 10.3390/insects16020209 · 2025-02-14

## TL;DR

This study examines whether using all 19 bioclimatic variables in Random Forest models improves predictions of species distribution, using a spider species as a case study.

## Contribution

The study shows that using all available variables in Random Forest models can outperform models with fewer, manually selected variables.

## Key findings

- The full model with all 19 variables consistently outperformed models with fewer variables.
- Randomly selected variable sets often performed as well as or better than manually curated ones.
- Using all variables may help avoid losing important information when ecological knowledge is limited.

## Abstract

The Climate Envelope Model (CEM) typically uses 19 bioclimatic variables to predict species distribution, but selecting ecological meaningful variables for target species is challenging. Random Forest (RM) models, which handle variable correlation, interaction, and nonlinearity well, were tested using an approach that includes all 19 variables. This was compared to three other model variants: a simplified model with two variables, a model with ecologically selected variables, and a model with statistically selected variables. The model using all variables generally performed better than those with fewer variables, and models with randomly selected variables often outperformed manually curated ones, showing the risks of losing important information during variable selection. The findings suggest that Crustulina guttata may have been artificially spread from Europe and highlight the advantages of using all available variables in RF models when the biological responses of a species are unclear. However, further research is certainlynecessary to confirm these results across other species and environmental contexts.

Climate Envelope Models (CEMs) commonly employ 19 bioclimatic variables to predict species distributions, yet selecting which variables to include remains a critical challenge. Although it seems logical to select ecologically relevant variables, the biological responses of many target species are poorly understood. Random Forest (RF), a popular method in CEMs, can effectively handle correlated and nonlinear variables. In light of these strengths, this study explores the full model hypothesis, which involves using all 19 bioclimatic variables in an RF model, using Crustulina guttata (Theridiidae: Araneae) as a test case. Four model variants—a simplified model with two variables, an ecologically selected model with seven variables, a statistically selected model with ten variables, and a full model with nineteen variables—were compared against a thousand randomly assembled models with matching variable counts. All models achieved high performance, though results varied based on the number of variables employed. Notably, the full model consistently produced stronger predictions than models with fewer variables. Moreover, specifying particular variables did not yield a significant advantage over random selections of equally sized sets, indicating that omitting variables may risk the loss of important information. Although the final model suggests that C. guttata may have dispersed beyond its native European range through artificial means, this study examined only a single species. Thus, caution is warranted in generalizing these findings, and additional research is needed to determine whether the full model hypothesis extends to other taxa and environmental contexts. In scenarios where ecological knowledge is limited, however, using all available variables in an RF model may preserve potentially significant predictors and enhance predictive accuracy.

## Linked entities

- **Species:** Crustulina guttata (taxon 1871941), Mus musculus (taxon 10090)

## Full-text entities

- **Species:** Crustulina guttata (species) [taxon 1871941]

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11857067/full.md

---
Source: https://tomesphere.com/paper/PMC11857067