# StrataSeq: A Workflow for Rapid Development of Molecular Databases for Hard‐To‐Identify Species

**Authors:** Anna K. Merges, Peter Manning, Dennis Baulechner, Katharina John, Andrey Zaitsev, Volkmar Wolters, Damian Baranski, Hans‐Peter Grossart, Jason Woodhouse, Clément Schneider, Miklós Bálint

PMC · DOI: 10.1002/ece3.72375 · 2025-10-23

## TL;DR

StrataSeq is a new method to efficiently create DNA reference databases for hard-to-identify species, improving biodiversity monitoring.

## Contribution

StrataSeq introduces a systematic workflow to optimize DNA reference database generation for species-rich, hard-to-identify taxa.

## Key findings

- StrataSeq captured 69% of species with only 22% of the effort compared to traditional methods.
- The workflow is adaptable to various organisms and environmental settings.
- StrataSeq enhances cost-effectiveness and scalability of molecular database development.

## Abstract

Biodiversity loss necessitates improved monitoring of small, species‐rich taxa, such as protists, phyto‐ and zooplankton and terrestrial invertebrates. Traditional biomonitoring is often infeasible for these taxa due to complex morphology and few taxonomists. DNA‐based approaches offer promising solutions by enabling rapid species identification. However, the effectiveness of these methods depends on the completeness of molecular reference databases, which remain incomplete, particularly for remote and biodiverse regions. To address this, we propose the StrataSeq workflow, a systematic approach to optimise the generation of DNA reference databases for hard‐to‐identify taxa. Reference sequences allow us to connect molecular operational taxonomic units to a wealth of information available for many described taxa. StrataSeq consists of four key steps: (1) Habitat‐stratified sample subsetting selects a minimal but ecologically representative sample set by stratifying along key environmental gradients. (2) Prioritising morphospecies involves sorting specimens into morphospecies and ranking them based on their occurrence across samples, prioritising common taxa for detailed identification. (3) Detailed morphological identification focuses on common morphospecies to maximise taxonomic coverage while minimising effort. (4) Reference DNA sequence generation targets taxa lacking molecular references, with sequenced specimens deposited as museum vouchers. We benchmarked the StrataSeq workflow using two datasets of Collembola from grassland soils in Germany. In comparison with a species list generated by a more labour‐intensive traditional approach (identification of randomly selected individuals from all samples), the StrataSeq workflow captured 69% of species but required only 22% of the effort. StrataSeq is adaptable to various organism groups and environmental settings, including both spatial and temporal gradients. The workflow enhances the cost‐effectiveness of generating reference DNA databases, supporting improved biodiversity monitoring and ecological research. StrataSeq offers a scalable solution to accelerate the completion of molecular databases, thereby improving biomonitoring and ecosystem assessments under global change pressures.

Biodiversity loss calls for better monitoring of small, species‐rich taxa, such as soil invertebrates, but traditional methods are limited due to complex morphology and lack of expertise. DNA‐based approaches offer a solution, but their effectiveness depends on incomplete molecular reference databases. The StrataSeq workflow optimises DNA reference generation through habitat stratification, prioritising common species for detailed identification and sequencing specimens lacking molecular references, making biodiversity monitoring more efficient and cost‐effective, as shown by its successful application to Collembola in German grasslands.

## Linked entities

- **Species:** Collembola (taxon 30001)

## Full-text entities

- **Chemicals:** ethanol (MESH:D000431), water (MESH:D014867), histosol (MESH:C027178), albeluvisol (-)
- **Species:** Collembola (snow fleas, class) [taxon 30001], PX clade (clade) [taxon 569578], Protaphorura armata (species) [taxon 187684], Homo sapiens (human, species) [taxon 9606]

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12547479/full.md

---
Source: https://tomesphere.com/paper/PMC12547479