# Global population structure of Shiga toxin-producing Escherichia coli O103:H2 and the variation in their major virulence factor-encoding genetic elements

**Authors:** Itsuki Taniguchi, Yo Morimoto, Yoko Kimura, Junji Seto, Yuko Kawai, Tomoko Kitahashi, Junko Aoki, Katsuya Terai, Toshihiko Furuta, Yuki Wakabayashi, Sumiko Tanabe, Mitsuhiro Hamasaki, Yuri Abe, Mari Sasaki, Hiroshi Narimatsu, Eiji Yokoyama, Sunao Iyoda, Tetsuya Hayashi, Keiji Nakamura

PMC · DOI: 10.1099/mgen.0.001625 · Microbial Genomics · 2026-01-23

## TL;DR

This study explores the global genetic diversity and virulence factors of a harmful type of E. coli called STEC O103:H2, revealing its population structure and evolutionary changes in key disease-causing genes.

## Contribution

The study provides the first comprehensive whole-genome analysis of STEC O103:H2, revealing its population structure and genetic variation in virulence elements.

## Key findings

- O103:H2 STEC strains are divided into three distinct lineages based on sequence type.
- The major lineage is further split into five clades with C1 as the ancestral group.
- Significant genetic variation was found in virulence-related elements like the Stx1a phage and virulence plasmid.

## Abstract

Shiga toxin (Stx)-producing Escherichia coli (STEC) is a major cause of serious gastrointestinal illness, including diarrhoea, haemorrhagic colitis and life-threatening haemolytic-uraemic syndrome. Although O157:H7 STEC strains are the most prevalent, the incidence of STEC infections caused by several other serotypes has recently increased. O103:H2 STEC is one of these major non-O157 STEC strains, but systematic whole-genome sequence (WGS) analyses have not yet been conducted. To gain a global phylogenetic overview of O103:H2 STEC based on WGSs, we analysed 2,701 WGSs of O103:H2 strains, including 193 sequenced in this study. Sequence type (ST)-based classification divided the O103:H2 strains into three distinct E. coli lineages. As the virulence marker genes of typical STECs (stx, eae and ehxA) were found only in the major O103:H2 lineage (n=2,658) comprising ST17 and its single- and double-locus variants, we performed a global phylogenetic analysis of the major lineage. This analysis revealed that this lineage was divided into five clades (C1–C5) and that C1 was the ancestral clade, C2 and C3 emerged from C1 and C4 and C5 emerged from C3. While stx2 genes were sporadically distributed in limited STEC O103:H2 strains, stx1a, eae and ehxA were highly conserved throughout the entire STEC O103:H2 lineage. However, through a detailed comparison of seven closed genomes of STEC strains, covering the five clades and including four obtained in this study, we found marked variation in the genetic elements encoding the virulence genes (Stx1a phage, the locus of enterocyte effacement (LEE) and the virulence plasmid), such as rearrangement in the LEE accessory region, a shift in the integration sites of the Stx1a phage due to the replacement of the integrase gene-containing genomic segments, the replacement of the virulence plasmid and the gain and loss of virulence-related genes in the virulence plasmid. Overall, this study highlights the current global population structure of O103:H2 strains and provides evolutionary insights into the variation in virulence determinants within STEC O103:H2, which is relatively understudied among the major STEC lineages.

## Linked entities

- **Genes:** ST8SIA2 (ST8 alpha-N-acetyl-neuraminide alpha-2,8-sialyltransferase 2) [NCBI Gene 8128], eae (T3SS intimin) [NCBI Gene 915471], STX1A (syntaxin 1A) [NCBI Gene 6804]
- **Diseases:** diarrhoea (MONDO:0001673)
- **Species:** Escherichia coli (taxon 562)

## Full-text entities

- **Genes:** ST8SIA2 (ST8 alpha-N-acetyl-neuraminide alpha-2,8-sialyltransferase 2) [NCBI Gene 8128] {aka HsT19690, SIAT8-B, SIAT8B, ST8SIA-II, ST8SiaII, STX}, ehxA [NCBI Gene 3654480], TEM-1 beta-lactamase [NCBI Gene 13905334]
- **Diseases:** diarrhoea (MESH:D003967), enteritis (MESH:D004751), EPEC (MESH:D004927), gastrointestinal illness (MESH:D005767), IS (MESH:C538388), haemorrhagic colitis (MESH:D006470), haemolytic uraemic syndrome (MESH:D006463), BAPS (MESH:C537210), STEC infections (MESH:D007239), DLVs (MESH:D005671)
- **Chemicals:** streptomycin (MESH:D013307), sulphonamide (MESH:D013449), CC17 (-)
- **Species:** Homo sapiens (human, species) [taxon 9606], Sus scrofa (pig, species) [taxon 9823], Bos taurus (bovine, species) [taxon 9913], Escherichia coli (E. coli, species) [taxon 562], Escherichia coli O103:H2 (no rank) [taxon 376725], Petrachloros mirabilis (species) [taxon 2918835], Escherichia coli O157:H7 (no rank) [taxon 83334], Otitesella sp. 26 (species) [taxon 257750]
- **Cell lines:** CEC12044 — Homo sapiens (Human), Transformed cell line (CVCL_9605), PV16-126 — Homo sapiens (Human), Xeroderma pigmentosum, complementation group D, Finite cell line (CVCL_RU39), O103 — Mus musculus (Mouse), Hybridoma (CVCL_L845), pO157 — Homo sapiens (Human), Xeroderma pigmentosum-Cockayne syndrome complex, Finite cell line (CVCL_U690)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12831630/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12831630/full.md

## References

48 references — full list in the complete paper: https://tomesphere.com/paper/PMC12831630/full.md

---
Source: https://tomesphere.com/paper/PMC12831630