Hybrid Spatial Representations for Species Distribution Modeling

Shiran Yuan; Hao Zhao

arXiv:2410.10937·cs.LG·October 24, 2024

Hybrid Spatial Representations for Species Distribution Modeling

Shiran Yuan, Hao Zhao

PDF

Open Access 1 Repo 4 Reviews

TL;DR

This paper introduces a hybrid spatial representation combining implicit and explicit embeddings to improve species distribution modeling, especially with presence-only data and multiple species, outperforming previous methods.

Contribution

The paper proposes a novel hybrid embedding scheme that enhances spatial prediction accuracy by capturing local features more effectively in species distribution models.

Findings

01

Outperforms previous models on standard benchmarks

02

Hybrid representation captures local spatial details better

03

Addresses challenges with presence-only data and multiple species

Abstract

We address an important problem in ecology called Species Distribution Modeling (SDM), whose goal is to predict whether a species exists at a certain position on Earth. In particular, we tackle a challenging version of this task, where we learn from presence-only data in a community-sourced dataset, model a large number of species simultaneously, and do not use any additional environmental information. Previous work has used neural implicit representations to construct models that achieve promising results. However, implicit representations often generate predictions of limited spatial precision. We attribute this limitation to their inherently global formulation and inability to effectively capture local feature variations. This issue is especially pronounced with presence-only data and a large number of species. To address this, we propose a hybrid embedding scheme that combines both…

Peer Reviews

Decision·ICLR 2026 Conference Withdrawn Submission

Reviewer 01Rating 6Confidence 2

Strengths

- Clear, convincing motivation for explicit + implicit: locality, non-Lipschitz boundaries, and parameter interpretability at grid cells. - Limited data requirements, the authors use presence-only data (no presence–absence labels), require no additional environmental covariates, and rely on community-sourced datasets. - Strong empirical gains on S&T and IUCN across data regimes. - Explicit embeddings correlate more with environmental proxies (GeoFeatures), supporting the design hypothesis. - An

Weaknesses

1. The paper mainly reports mAP and precision–recall curves. However, since this is a spatial modeling task, it should also measure how well the predicted ranges match the actual geographic boundaries. 2. The model assumes that locations without observations are negative, but this may not always be true. I am not sure how these “pseudo-absences” affect the results. 3. The results are averaged across all species, but it would be helpful to know which types of species or regions benefit most fro

Reviewer 02Rating 2Confidence 4

Strengths

[S1] The proposed hybrid approach achieves notable performance gains on the Cole et al. (2023) dataset, indicating that integrating implicit and explicit representations is more effective than relying on a single representation in location-only SDMs. [S2] The study provides a series of comparative results between explicit, implicit, and hybrid methods using the Cole et al. (2023) dataset.

Weaknesses

[W1] Overall, I find that the paper shows a limited understanding of the SDM field, which results in weak motivation and unclear contributions. - [W1.1] The introduction and related work sections are weak, and several statements are imprecise or incorrect. Some specific examples: - L37: Which models are you referring to here? - L50-53: This does not reflect the conclusions of Cole et al. (2023) and is inaccurate. High-quality data remain essential for obtaining precise SDMs. For instanc

Reviewer 03Rating 4Confidence 3

Strengths

- The experimental results point towards an advantage of the proposed hybrid representation when compared to both the implicit and explicit representations. - In particular, the proposed approach works better (or at least as well) than a large implicit model while running much faster. - The paper is easy to follow and well written.

Weaknesses

1. My main issue with this paper lies in its justification: (1) the argument is that implicit representations lead to parameters being shared across different locations (like between the Amazon and the Sahara). This is indeed the main advantage of such representations, since similar locations (in terms of species composition) will tend to share a similar representation; (2) a second argument is that implicit representations are unable to capture high frequency spatial patterns. Although it is tr

Reviewer 04Rating 2Confidence 5

Strengths

1. The motivation section emphasize the problems of existing SDM models which helps the readers to understand the challenages. 2. The map visualization also helps readers to understand the problems.

Weaknesses

The design of the model has significant problems that need to be addressed: 1. The explicit location encoding is not new. It was proposed a long time ago [1]. Its drawbacks have been studied in many previous studies [1, 2, 3]. There is an MAUP (Modifiable Areal Unit Problem). Which resolution level should the explicit location encoder store the learnable parameters? If the resolution is very small, the number of learnable parameters is very large and leads to overfitting. The cell without any t

Code & Models

Repositories

shiran-yuan/hsr-sdm
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIsotope Analysis in Ecology · Species Distribution and Climate Change · Genetic diversity and population structure