# Performance and Limitations of Out‐Of‐Distribution Detection for Insect DNA Barcoding

**Authors:** Tomochika Fujisawa, Takashi Imai

PMC · DOI: 10.1002/ece3.73112 · Ecology and Evolution · 2026-02-17

## TL;DR

This study examines how well DNA barcoding can detect unknown insect species, finding that short DNA fragments make it harder to identify unknowns than known species.

## Contribution

The study provides empirical guidelines for improving out-of-distribution detection in DNA barcoding by analyzing performance limits with short and noisy sequences.

## Key findings

- Out-of-distribution detection is more sensitive to sequence noise and short fragment lengths than species identification.
- Performance of detection drops significantly with fragments shorter than 300 bp, regardless of the method used.
- DNA barcoding remains accurate for known species identification even with short, noisy fragments.

## Abstract

Successful applications of DNA barcoding rely on the accurate taxonomic identification of sequence fragments. When biological surveys with DNA barcoding target underexplored biological communities, sequence‐based identification is often conducted using incomplete databases that do not fully cover the regional species pool. Consequently, specimens to be identified may include species not present in reference databases. Such unknown or “out‐of‐distribution” samples can cause misidentification if left undetected. A similarity cutoff is commonly used to detect out‐of‐distribution samples before taxonomic assignment, but its effectiveness has not been carefully studied. In this study, we evaluated the performance of out‐of‐distribution detection for DNA barcoding with genetic distance and deep learning metrics. Using extensively sampled datasets of multiple insect taxa, we measured the performance of identification and out‐of‐distribution detection under conditions in which genetic variations in species were sufficiently sampled. Although identification with DNA barcoding is a highly accurate process, even with short noisy fragments, out‐of‐distribution detection was more susceptible to a reduction in performance due to sequence noise and a lack of diagnosable characters. When fragments shorter than 300 bp were used for out‐of‐distribution detection, large performance reductions were observed irrespective of detection methods. Our results provide guidelines for designing unknown‐proof identification procedures by determining factors affecting out‐of‐distribution detection performance.

Benchmarking shows that detecting unknown species is more difficult than identifying known species with short DNA barcoding.

## Full-text entities

- **Diseases:** ID (MESH:D020243)
- **Chemicals:** BOLD (-)
- **Species:** Diptera (flies, order) [taxon 7147], Apis mellifera (bee, species) [taxon 7460], Cryptocephalus (genus) [taxon 204943], Homo sapiens (human, species) [taxon 9606], Vespidae (wasps, family) [taxon 7438]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12912926/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12912926/full.md

## References

54 references — full list in the complete paper: https://tomesphere.com/paper/PMC12912926/full.md

---
Source: https://tomesphere.com/paper/PMC12912926