# The transcriptional regulatory network of the Escherichia coli MG1655 reference strain

**Authors:** Heera Bajpe, Jongoh Shin, Ying Hefner, Richard Szubin, Jaemin Sung, Yuan Yuan, Bernhard O Palsson

PMC · DOI: 10.1093/nar/gkag059 · Nucleic Acids Research · 2026-02-02

## TL;DR

This paper creates a detailed map of gene regulation in E. coli MG1655 using RNA-seq data from many conditions, revealing how genes are controlled under different environments.

## Contribution

The study introduces a comprehensive transcriptomic knowledgebase and identifies iModulons to characterize the transcriptional regulatory network in E. coli MG1655.

## Key findings

- iModulons explain 75% of the variance in the dataset through knowledge enrichment.
- 67% of iModulons are linked to single or combined dominant regulators.
- Non-wild-type transcriptomic data alters iModulon gene membership, showing network malleability.

## Abstract

The growth of RNA sequencing (RNA-seq) data accompanied by the development of novel scalable data analytic methods has revealed a deep understanding of the composition of bacterial transcriptomes. This new, first-biological-principles understanding has enabled a novel characterization of the function of the transcriptional regulatory network. Here, we present a single-strain wild-type transcriptomic knowledgebase for the model strain Escherichia coli MG1655. The associated transcriptomic compendium consists of 584 high-quality RNA-seq samples from wild-type E. coli MG1655 generated using a single protocol. These samples range over a wide condition space, including 45 carbon sources and 10 base media. Using independent component analysis, we decomposed the transcriptomic compendium to extract 115 independently modulated sets of genes (iModulons). We find that (i) iModulons explain 75% of variance in the dataset through knowledge enrichment; (ii) 67% of iModulons are associated with single/combined dominant regulators; (iii) iModulon activity profiles of samples can be utilized to elucidate patterns within the transcriptional regulatory network, such as differences in aerobicity; and (iv) the use of transcriptomic data derived from non-wild-type strains results in changes in iModulon gene membership, highlighting the malleability of the transcriptional regulatory network. Altogether, this knowledgebase serves as a resource for multi-scale knowledge mining for transcriptional regulation in E. coli MG1655.

Graphical Abstract

## Full-text entities

- **Genes:** ygbI (DNA-binding transcriptional repressor YgbI) [NCBI Gene 947204] {aka ECK2730}, rpoD (RNA polymerase sigma factor RpoD) [NCBI Gene 947567] {aka ECK3057, alt}, arcA (DNA-binding transcriptional dual regulator ArcA) [NCBI Gene 948874] {aka ECK4393, cpxC, dye, fexA, msp, seg}, ydfZ (putative selenoprotein YdfZ) [NCBI Gene 948796] {aka ECK1534}, yeaE (methylglyoxal reductase YeaE) [NCBI Gene 946302] {aka ECK1779}, dmlA (D-malate/3-isopropylmalate dehydrogenase (decarboxylating)) [NCBI Gene 946319] {aka ECK1798, ttuC, yeaU}, lysP (lysine:H(+) symporter) [NCBI Gene 946667] {aka ECK2149, cadR}, ydiY (acid-inducible putative outer membrane protein YdiY) [NCBI Gene 946218] {aka ECK1720}, soxS (DNA-binding transcriptional dual regulator SoxS) [NCBI Gene 948567] {aka ECK4054}, rpoS (RNA polymerase sigma factor RpoS) [NCBI Gene 947210] {aka ECK2736, abrD, appR, csi2, dpeB, katF}
- **Diseases:** TRN (OMIM:602482), ALE (MESH:D007757)
- **Chemicals:** ROS (MESH:D017382), serine (MESH:D012694), sorbitol (MESH:D013012), carbon (MESH:D002244), Phosphate (MESH:D010710), asparagine (MESH:D001216), LPS (MESH:D008070), 2,2'-dipyridyl (MESH:D015082), glutarate (MESH:D005977), tricarboxylic acid (MESH:D014233), amino acid (MESH:D000596), glycine-betaine (MESH:D001622), leucine (MESH:D007930), Sugar (MESH:D000073893), glucose (MESH:D005947), tryptophan (MESH:D014364), sodium dodecyl sulfate (MESH:D012967), N (MESH:D009584), valine (MESH:D014633), fatty acid (MESH:D005227), glycolate (MESH:C031149), cysteine (MESH:D003545), paraquat (MESH:D010269), H2S (MESH:D006862), rhamnose (MESH:D012210), M9-glucose (-)
- **Species:** Escherichia coli (E. coli, species) [taxon 562], Escherichia coli str. K-12 substr. MG1655 (no rank) [taxon 511145], Parundibacterium terreum (species) [taxon 1224302], Escherichia coli K-12 (strain) [taxon 83333], Bacteria Latreille et al. 1825 (Bacteria stick insect, genus) [taxon 629395], Escherichia coli BW25113 (no rank) [taxon 679895]
- **Cell lines:** MG1655 — Homo sapiens (Human), Maple syrup urine disease, Transformed cell line (CVCL_D514), UC-4 — Homo sapiens (Human), Bladder carcinoma, Cancer cell line (CVCL_1783), PRECISE-1 — Mus musculus (Mouse), Hybridoma (CVCL_C7RB)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12862375/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12862375/full.md

## References

51 references — full list in the complete paper: https://tomesphere.com/paper/PMC12862375/full.md

---
Source: https://tomesphere.com/paper/PMC12862375