# GeoGenIE: a deep learning approach to predict geographic provenance of biodiversity samples from genomic SNPs

**Authors:** Bradley T Martin, Zachery D Zbinden, Michael E Douglas, Marlis R Douglas, Tyler K Chafin

PMC · DOI: 10.1093/bioadv/vbaf250 · Bioinformatics Advances · 2025-10-09

## TL;DR

GeoGenIE is a new deep learning tool that predicts where biodiversity samples come from using genomic data, improving accuracy even in sparsely sampled regions.

## Contribution

GeoGenIE introduces a deep learning approach with novel data augmentation and preprocessing to predict geographic origin from genomic SNPs.

## Key findings

- GeoGenIE achieved higher geolocation accuracy with less spatial bias using fewer SNPs compared to existing methods.
- The tool performs well in undersampled regions, showing effectiveness in challenging conditions.
- GeoGenIE's parallelized execution allows fast processing of large genomic datasets.

## Abstract

Determining geographic origin of samples is a common objective in wildlife management, forensics, and conservation. Current methods often assume evolutionary models or require extensive reference datasets, which are costly and difficult to develop, that perform poorly with uneven or biased sampling. Supervised deep learning offers a promising alternative by learning complex patterns without prior model specifications. Combined with novel geo-genetic data augmentation and preprocessing techniques, it can reduce reference panel demands and improve performance across diverse sampling schemes, broadening accurate provenance determination to more study systems.

We present GeoGenIE, an open-source software package powered by PyTorch for geographic provenance prediction from genomic data. GeoGenIE implements a multilayer perceptron architecture within an automated hyperparameter tuning framework, incorporating preprocessing, geo-genetic outlier detection, and data augmentation to improve accuracy in sparsely sampled regions. Benchmarking against a comparable approach with White-tailed deer (Odocoileus virginianus) double digest restriction-site associated DNA sequencing data, GeoGenIE achieved substantially improved geolocation accuracy with less spatial bias using a smaller SNP panel. Gains were most evident in undersampled regions, underscoring effectiveness under challenging conditions. Its parallelized execution also produced fast runtimes, promoting its application to large datasets.

Open-source at https://github.com/btmartin721/geogenie and https://pypi.org/project/GeoGenIE/.

## Linked entities

- **Species:** Odocoileus virginianus (taxon 9874)

## Full-text entities

- **Species:** Odocoileus virginianus (white-tailed deer, species) [taxon 9874]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12596584/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12596584/full.md

## References

34 references — full list in the complete paper: https://tomesphere.com/paper/PMC12596584/full.md

---
Source: https://tomesphere.com/paper/PMC12596584