# A Structure-Based Deep Learning Framework for Correcting Marine Natural Products’ Misannotations Attributed to Host–Microbe Symbiosis

**Authors:** Xiaohe Tian, Chuanyu Lyu, Yiran Zhou, Liangren Zhang, Aili Fan, Zhenming Liu

PMC · DOI: 10.3390/md24010020 · 2026-01-01

## TL;DR

A deep learning framework is developed to correct misannotations in marine natural products caused by host-microbe symbiosis, improving drug discovery and biosynthetic studies.

## Contribution

A novel structure-based deep learning workflow is introduced to detect and correct misannotations in marine natural product datasets.

## Key findings

- The model achieves 85.56% balanced accuracy in predicting microbial origins of marine natural products.
- 3996 compounds with conflicting microbial and Animalia labels are identified as potential symbiotic metabolites.
- Interpretability analysis reveals biologically coherent structural patterns among misannotated compounds.

## Abstract

Marine natural products (MNPs) are a diverse group of bioactive compounds with varied chemical structures, but their biological origins are often misannotated due to complex host–microbe symbiosis. Propagated through public databases, such errors hinder biosynthetic studies and AI-driven drug discovery. Here, we develop a structure-based workflow of origin classification and misannotation correction for marine datasets. Using CMNPD and NPAtlas compounds, we integrate a two-step cleaning strategy that detects label inconsistencies and filters structural outliers with a microbial-pretrained graph neural network. The optimized model achieves a balanced accuracy of 85.56% and identifies 3996 compounds whose predicted microbial origins contradict their Animalia labels. These putative symbiotic metabolites cluster within known high-risk taxa, and interpretability analysis reveal biologically coherent structural patterns. This framework provides a scalable quality-control approach for natural product databases and supports more accurate biosynthetic gene cluster (BGC) tracing, host selection, and AI-driven marine natural product discovery.

## Full-text entities

- **Chemicals:** MNPs (-)

## Figures

12 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12843494/full.md

---
Source: https://tomesphere.com/paper/PMC12843494