# Fine-Grained Assignment of Unknown Marine eDNA Sequences Using Neural Networks

**Authors:** Sébastien Villon, Morgan Mangeas, Véronique Berteaux-Lecellier, Laurent Vigliola, Gaël Lecellier

PMC · DOI: 10.3390/biology15030285 · Biology · 2026-02-05

## TL;DR

This paper introduces a deep learning method to improve the classification of marine environmental DNA sequences when species are not in reference databases.

## Contribution

The novel contribution is a neural network approach that leverages nucleotide positional patterns for accurate genus- and family-level taxonomic assignments.

## Key findings

- The method achieves 94.7% accuracy at the genus level and 86.5% at the family level.
- It outperforms existing bioinformatics tools under constrained conditions.
- The approach remains robust with limited training data and benefits from sequence alignment.

## Abstract

Environmental DNA (eDNA) metabarcoding is increasingly used to monitor biodiversity by detecting traces of DNA left by organisms in the environment. While this approach allows the simultaneous detection of many species, its effectiveness is often limited by incomplete reference databases, especially in marine ecosystems. As a result, species-level identification is frequently unreliable or impossible, leading to large proportions of unassigned sequences. In this study, we propose a deep learning-based approach designed to improve taxonomic assignment at higher levels, such as genus and family, even when species are absent from reference databases. By learning patterns directly from DNA sequences, our method provides more accurate and consistent assignments than commonly used bioinformatic tools under constrained conditions. Although species-level identification remains essential when feasible, reliable genus- and family-level information already supports many ecological applications, including functional analyses, community comparisons, and long-term monitoring. Our results highlight the potential of artificial intelligence to complement existing eDNA tools and enhance biodiversity assessments in data-limited contexts.

Environmental DNA (eDNA) metabarcoding is an innovative tool that is transforming ecological research. It offers a simple and effective method for simultaneously detecting numerous species across a wide range of environments. The method relies on assigning DNA sequences sampled from the environment to taxa, which is straightforward for species that have already been sequenced and are represented in reference databases. However, existing bioinformatics tools often fail to deliver accurate, fine-grained assignments when target species are absent from these databases. This limitation arises from handcrafted classification thresholds that do not account for nucleotide positional information. Here, we propose a deep neural architecture specifically designed to exploit both nucleotide identity and positional patterns in short TELEO sequences. Using an in-silico validation framework based on NCBI genbank sequences, we compare our approach with several state-of-the-art bioinformatics tools (Obitools, Kraken2, Lolo), as well as alternative sequence embedding methods, under controlled conditions. Our approach yields significantly higher classification accuracy at the genus and family levels, achieving average accuracies of 94.7% at the genus level and 86.5% at the family level, substantially outperforming the tested reference-based pipelines. The method remains robust with limited training data and shows improved performance when nucleotide positional information is preserved through sequence alignment. These results demonstrate the potential of AI-powered eDNA metabarcoding to complement existing taxonomic assignment tools, particularly in contexts where reference databases are incomplete or species-level resolution is not achievable, thereby supporting biodiversity monitoring and ecosystem management.

## Full-text entities

- **Diseases:** injury to (MESH:D014947)
- **Chemicals:** water (MESH:D014867)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12897059/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12897059/full.md

## References

27 references — full list in the complete paper: https://tomesphere.com/paper/PMC12897059/full.md

---
Source: https://tomesphere.com/paper/PMC12897059