# High-Resolution Microbial Fingerprinting for Forensic Individual Identification: A Proof-of-Concept Study Integrating 2bRAD-M and Hierarchical Attention Network

**Authors:** Haoran Li, Zhiyao Yu, Zhijing Wu, Yuxin Lin, Tao Liu, Yuli Liu, Juan An, Jing Zhao, Yan Liu, Xueman Ma, Haiyan Wang

PMC · DOI: 10.3390/genes17030263 · Genes · 2026-02-26

## TL;DR

This study shows that skin and saliva microbes can be used to identify individuals with high accuracy using a new method combining DNA sequencing and machine learning.

## Contribution

A novel framework integrating 2bRAD-M sequencing and a hierarchical attention network for forensic individual identification is proposed.

## Key findings

- The HAN model achieved 98.7% Rank-1 accuracy for pristine samples, outperforming other methods.
- Microbial signatures showed high temporal stability (ICC = 0.86 over 180 days) and robustness in mixed samples.
- Particulate matter exposure significantly influenced microbial composition (PERMANOVA R2 = 0.32, p < 0.001).

## Abstract

Background: Human skin and saliva microbial communities have emerged as promising forensic biomarkers due to their individual specificity. However, existing studies are limited by small sample sizes and methodological inconsistencies. This proof-of-concept study aims to develop a novel framework integrating 2bRAD-M sequencing with a hierarchical attention network (HAN) for forensic individual identification, addressing these limitations through large-scale public data integration and controlled validation. Methods: We utilized 2263 skin and saliva samples from public databases (Qiita, HMP, NCBI SRA) for model development. These public data included longitudinal samples collected over periods up to 180 days. A contemporary validation cohort of 6 volunteers, providing 26 forensic-relevant samples (including simulated touch evidence), was sequenced using 2bRAD-M for validation. Data integration involved batch effect correction (ComBat), normalization (CSS), and cross-database harmonization using GTDB for taxonomic assignment. The HAN model was optimized with triplet margin loss for metric learning. Results: The HAN model achieved 98.7% Rank-1 accuracy for pristine samples, outperforming random forest (70.2%) and CNN (75.8%). Microbial signatures showed high temporal stability (ICC = 0.86 over 180 days) and robustness in mixed samples (87.4% accuracy). Discriminatory biomarkers included Cutibacterium (skin) and Prevotella (saliva). Particulate matter exposure significantly influenced microbial composition (PERMANOVA R2 = 0.32, p < 0.001). Conclusions: This study establishes a proof-of-concept pipeline for microbial forensics, demonstrating high accuracy under controlled conditions. Future work must address antibiotic exposure, sample diversity, and cross-laboratory validation before forensic implementation.

## Linked entities

- **Species:** Cutibacterium (taxon 1912216), Prevotella (taxon 838)

## Full-text entities

- **Chemicals:** Particulate (-)
- **Species:** Homo sapiens (human, species) [taxon 9606], Prevotella (genus) [taxon 838], Cutibacterium (genus) [taxon 1912216]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13025914/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13025914/full.md

## References

55 references — full list in the complete paper: https://tomesphere.com/paper/PMC13025914/full.md

---
Source: https://tomesphere.com/paper/PMC13025914