A Structure-Based Deep Learning Framework for Correcting Marine Natural Products’ Misannotations Attributed to Host–Microbe Symbiosis
Xiaohe Tian, Chuanyu Lyu, Yiran Zhou, Liangren Zhang, Aili Fan, Zhenming Liu

TL;DR
A deep learning framework is developed to correct misannotations in marine natural products caused by host-microbe symbiosis, improving drug discovery and biosynthetic studies.
Contribution
A novel structure-based deep learning workflow is introduced to detect and correct misannotations in marine natural product datasets.
Findings
The model achieves 85.56% balanced accuracy in predicting microbial origins of marine natural products.
3996 compounds with conflicting microbial and Animalia labels are identified as potential symbiotic metabolites.
Interpretability analysis reveals biologically coherent structural patterns among misannotated compounds.
Abstract
Marine natural products (MNPs) are a diverse group of bioactive compounds with varied chemical structures, but their biological origins are often misannotated due to complex host–microbe symbiosis. Propagated through public databases, such errors hinder biosynthetic studies and AI-driven drug discovery. Here, we develop a structure-based workflow of origin classification and misannotation correction for marine datasets. Using CMNPD and NPAtlas compounds, we integrate a two-step cleaning strategy that detects label inconsistencies and filters structural outliers with a microbial-pretrained graph neural network. The optimized model achieves a balanced accuracy of 85.56% and identifies 3996 compounds whose predicted microbial origins contradict their Animalia labels. These putative symbiotic metabolites cluster within known high-risk taxa, and interpretability analysis reveal biologically…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMicrobial Natural Products and Biosynthesis · Machine Learning in Materials Science · Genomics and Phylogenetic Studies
