# Structure of the Enterobacter pan-genome is revealed using machine learning

**Authors:** Joshua T. Burrows, Gaoyuan Li, Jonathan M. Monk, Siddharth M. Chauhan, Bernhard O. Palsson

PMC · DOI: 10.1128/spectrum.01922-25 · Microbiology Spectrum · 2025-12-15

## TL;DR

This study uses machine learning to analyze the genetic diversity of Enterobacter, revealing its pangenome structure and how genes are inherited across species and subspecies.

## Contribution

The study introduces a novel approach using non-negative matrix factorization to classify Enterobacter genomes based on gene content, revealing lineage and horizontal inheritance patterns.

## Key findings

- 31 Phylons were identified, representing 21 lineage-associated gene sets and 10 mobile genetic element-associated gene sets.
- The pangenome structure was used to classify 2,291 fragmented genome sequences and map traits like antimicrobial resistance and virulence factors.
- The analysis provides insights into the distribution of genetic traits across Enterobacter species and subspecies.

## Abstract

The growing availability of publicly accessible Enterobacter genomes offers an opportunity to reveal the structure of its pangenome, uncovering the catalog of genes across the genus and their distribution across the different species and subspecies of the genus. In this study, we analyze 777 high-quality complete Enterobacter genomes using a pangenome matrix. The accessory genome, consisting of the genes found in many, but not all strains, was decomposed using non-negative matrix factorization (NMF) to identify groups of genes, called Phylons, that are found to be present across the subgroups of the genomes analyzed. The Phylons are representative of major modes of inheritance, both lineage-associated and horizontal, found across the pangenome. Using NMF, we defined 31 Phylons, representative of 21 lineage-associated gene sets, and 10 Phylons containing genes associated with mobile genetic elements. Six mobile Phylons were extrachromosomal, representing plasmids, and four associated with chromosomal DNA. These 31 Phylons define the structure of the Enterobacter pangenome. This structure is consistent with the classification of an additional 2,291 fragmented genome sequences. This structure enables the pangenome-wide mapping of genetic traits, such as motility genes, biosynthetic gene clusters, antimicrobial resistance genes, and virulence factors. NMF thus enabled phylogenetic and functional classification of genomes based on the pangenome-scale assessment of a genome’s gene portfolio. A robust classification of Enterobacter spp. enhances the understanding of the evolution of this clinically significant pathogen.

Enterobacter spp. represent a vital member of the Enterococcus faecium, Staphylococcus aureus, Klebsiella pneumoniae, Acinetobacter baumannii, Pseudomonas aeruginosa, Enterobacter species, and Escherichia coli pathogens relevant for their nosocomial pathogenicity and antimicrobial resistance. Understanding the genomic diversity of the genus is vital for further study of its evolution and resistance potential. We constructed a pangenome of 777 Enterobacter complete genomes. Machine learning techniques were used to mathematically define major subpopulations of Enterobacter based on their accessory gene content, which for the first time defined dominant modes of lineage-associated and horizontal inheritance. This analysis provides insights into the distribution of traits related to antimicrobial resistance, biosynthetic gene clusters, and virulence factors. This study provides robust classification of Enterobacter isolates identifying differential genetic traits across the species and subspecies of the genus, overcoming some of the ambiguity in its taxonomy.

## Linked entities

- **Species:** Enterobacter (taxon 547)

## Full-text entities

- **Species:** Klebsiella pneumoniae (species) [taxon 573], Enterobacter (genus) [taxon 547], Escherichia coli (E. coli, species) [taxon 562], Pseudomonas aeruginosa (species) [taxon 287], Staphylococcus aureus (species) [taxon 1280], Sagamiharavirus PP (species) [taxon 2956385], Acinetobacter baumannii (species) [taxon 470], Enterococcus faecium (species) [taxon 1352]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12889083/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12889083/full.md

## References

72 references — full list in the complete paper: https://tomesphere.com/paper/PMC12889083/full.md

---
Source: https://tomesphere.com/paper/PMC12889083