# Machine learning-based lineage prediction from antimicrobial susceptibility testing phenotypes for Escherichia coli sequence type 131 clade C surveillance across infection types

**Authors:** Theodor A. Ross, Anna K. Pöntinen, Einar Holsbø, Ørjan Samuelsen, Kristin Hegstad, Michael Kampffmeyer, Jukka Corander, Rebecca A. Gladstone

PMC · DOI: 10.1099/mgen.0.001608 · Microbial Genomics · 2026-01-19

## TL;DR

This study uses machine learning to track the spread of a drug-resistant E. coli strain from urine infections to bloodstream infections.

## Contribution

A novel machine learning approach to predict E. coli lineage from antimicrobial susceptibility data without genomic markers.

## Key findings

- XGBoost classifier achieved over 70% F1-score in predicting ST131-C lineage from AST data.
- ST131-C prevalence trends in UTIs and BSIs were similar, indicating UTIs drive BSI persistence.
- Higher ST131-C prevalence in BSIs than UTIs suggests enrichment during infection progression.

## Abstract

Rising antimicrobial resistance (AMR) in Escherichia coli bloodstream infections (BSIs) in high-income settings has typically been dominated by one clone, the sequence type (ST)131. More specifically, ST131 clade C (ST131-C) is associated with fluoroquinolone resistance and extended-spectrum β-lactamases (ESBLs). Even though urinary tract infections (UTIs) are a known common precursor to BSIs, there is currently limited knowledge on the longitudinal prevalence of ST131-C in UTIs and, therefore, the temporal link between the two infection types. Leveraging available genomic and antimicrobial susceptibility test (AST) data for ciprofloxacin, gentamicin and ceftazidime in 2,790 E. coli BSI isolates, we trained Random Forest and extreme gradient boosting (XGBoost) classifiers to predict if an E. coli isolate belongs to ST131-C using only AST data. These models were used to predict the yearly prevalence of ST131-C in 22942 UTI and 24866 BSI isolates from Norway. The XGBoost classifier achieved a prediction F1-score of over 70% on a highly unbalanced dataset where only 4.3% of the genomic BSI isolates belonged to ST131-C. The predicted prevalence of ST131-C in UTIs exhibited a similar annual trend to that of BSIs, with a stable infection burden for 8 years after its rapid expansion, confirming that the persistence of ST131-C in BSIs is largely driven by ST131-C UTIs. However, a higher prevalence of ST131-C in BSIs (~7 %) compared to UTIs (~4 %) suggests a subsequent enrichment of ST131-C. Our study highlights how existing epidemiological knowledge can be supplemented by utilizing extensive data from AMR surveillance efforts without genomic markers.

## Linked entities

- **Chemicals:** ciprofloxacin (PubChem CID 2764), gentamicin (PubChem CID 3467), ceftazidime (PubChem CID 5481173)
- **Species:** Escherichia coli (taxon 562)

## Full-text entities

- **Diseases:** UTIs (MESH:D014552), BSIs (MESH:D018805), infection (MESH:D007239), Escherichia coli (MESH:D004927)
- **Chemicals:** ciprofloxacin (MESH:D002939), fluoroquinolone (MESH:D024841), ceftazidime (MESH:D002442), gentamicin (MESH:D005839)
- **Species:** Escherichia coli O25b:H4-ST131 (no rank) [taxon 941322], Escherichia coli (E. coli, species) [taxon 562]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12816985/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12816985/full.md

## References

40 references — full list in the complete paper: https://tomesphere.com/paper/PMC12816985/full.md

---
Source: https://tomesphere.com/paper/PMC12816985