# Phylogeny‐Aware Metabologenomics Accurately Assigns Natural Products to Biosynthetic Gene Clusters

**Authors:** Judith Boldt, Christoph Porten, F. P. Jake Haeckl, Joachim J. Hug, Fabian Panter, Matthias Steglich, Joachim Wink, Jörg Overmann, Markus Göker, Daniel Krug, Rolf Müller, Ulrich Nübel

PMC · DOI: 10.1111/1751-7915.70298 · Microbial Biotechnology · 2026-01-14

## TL;DR

This paper introduces a new method that uses phylogeny to more accurately link biosynthetic gene clusters to natural products in microbes.

## Contribution

The novel contribution is a phylogeny-aware statistical approach that significantly reduces false associations between gene clusters and metabolites.

## Key findings

- The method identified 43 high-confidence associations between gene clusters and metabolites in myxobacteria.
- It correctly included 89% of previously characterized links and reduced spurious associations by 33-fold.
- The approach discovered new biosynthetic gene clusters for rowithocin and an undescribed poly-glycosylated natural product.

## Abstract

Tens of thousands of biosynthetic gene clusters (BGCs) have been identified in microbial genomes, but the vast majority of associated natural products (NPs) and their underlying biosyntheses remain unknown. Metabologenomics approaches integrate genomic and metabolomic datasets to statistically associate BGCs to their cognate NPs, yet often suggest many false links. Here, we show that incorporating information on the producer strains' phylogeny greatly improves accuracy. We sequenced 72 Sorangium spp. genomes (myxobacteria), predicting 2030 BGCs in 265 gene cluster families (GCFs). Mass spectrometry (MS1) revealed 99 metabolite families (MFs) from the same strains. Using a phylogeny‐aware statistical analysis, we identified 43 high‐confidence associations between GCFs and MFs, correctly including 89% of previously characterised links and reducing spurious associations by 33‐fold, compared to simple correlational analysis. Our approach identified previously unknown BGCs for rowithocin and an undescribed poly‐glycosylated NP. It also identified a distinct BGC associated with the production of chlorotonil C variants and refined the BGC for maracen. This study demonstrates the effectiveness of phylogeny‐aware metabologenomics as a scalable strategy for NP discovery and biosynthetic pathway elucidation, and provides a roadmap to improved analyses of paired‐omics data towards NP discovery.

We report a statistical approach for associating biosynthetic gene clusters with their cognate metabolites and demonstrate its unprecedented power for scalable natural product discovery. Notably, it drastically reduces spurious associations by explicitly integrating the phylogeny of microbial producer strains. Source:
https://BioRender.com/rsv6wxn.

## Linked entities

- **Species:** Mus musculus (taxon 10090)

## Full-text entities

- **Chemicals:** chlorotonil C (-)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12800573/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12800573/full.md

## References

66 references — full list in the complete paper: https://tomesphere.com/paper/PMC12800573/full.md

---
Source: https://tomesphere.com/paper/PMC12800573