# Activity-Cloud Organization of Shine–Dalgarno Sequences to Guide Translation Engineering in Escherichia coli

**Authors:** Pavel Zach, Yadira Boada, Jesús Pico, Alejandro Vignoni

PMC · DOI: 10.1021/acsomega.5c09784 · ACS Omega · 2026-02-01

## TL;DR

This paper introduces a framework to predict and control gene expression in E. coli by organizing Shine-Dalgarno sequences into activity clouds for more predictable translation tuning.

## Contribution

A data-driven framework organizes Shine-Dalgarno core variants into activity clouds for interpretable and predictable translation control in E. coli.

## Key findings

- Activity clouds group SD core variants with consistent expression levels, reducing variability from flanking DNA/RNA context.
- A bidirectional workflow allows selecting SD cores based on target ETR or predicting ETR from a core.
- Cloud stability and predictive utility were validated using an independent high-throughput dataset.

## Abstract

Modifying the six-nucleotide Shine–Dalgarno (SD)
core motif
inside the ribosome binding site (RBS) constitutes a straightforward
approach for tuning bacterial translation. However, existing methods
for adjusting the effective translation rate (ETR) lack predictability.
Even single-nucleotide substitutions can induce substantial alterations
in translation efficiency. Moreover, this unpredictability is exacerbated
by variations in the leader sequence, spacer region, or coding context.
By focusing on the SD core as a key, experimentally tunable determinant
of translation initiation in Escherichia coli, we introduce a coarse-grained framework that organizes SD core
variants into activity clouds with consistent expression levels. This
representation converts a dense sequence-to-phenotype map into an
interpretable design space for coarse-grained tuning of expression.
In contrast to thermodynamic tools such as the RBS Calculator, which
estimate initiation from biophysical parameters, our approach is data-driven
and emphasizes (i) interpretable rules over nucleotide positions,
(ii) a bidirectional workflow (core → expected ETR range; target
ETR → candidate cores), and (iii) simple paths between clouds
that suggest minimal sequence edits. Designed with the needs of research
teams in mind, our workflow prioritizes fixing the SD core first (i.e.,
selecting an appropriate activity cloud) to substantially narrow the
spread of observed ETRs across constructs. This dampens variability
introduced by flanking DNA/RNA context (leader, spacer, local secondary
structure), so that subsequent fine-tuning is simpler, cheaper, and
more predictable. We validate cloud stability and predictive utility
using an independent high-throughput data set. Our approach provides
a solid foundation for fast, interpretable coarse control of expression,
while fine-grained tuning can then be achieved through flanking-region
edits that account for spacing and local structure. Finally, we provide
an open web interface and repository, allowing researchers to explore
the hierarchy, inspect positional influences, and export candidate
cores. Together, these contributions advance the Bonde et al. data
set from a static lookup into a portable, actionable map for SD core
guided tuning of translation in E. coli, and outline a path to extend the idea to a fully functional framework,
with possible applications also to other bacteria.

## Linked entities

- **Species:** Escherichia coli (taxon 562)

## Full-text entities

- **Genes:** ESCO2 (establishment of sister chromatid cohesion N-acetyltransferase 2) [NCBI Gene 157570] {aka 2410004I17Rik, EFO2, EFO2p, JHS, RBS, hEFO2}
- **Diseases:** ASD (MESH:D006679), CLUSTER (MESH:D003027)
- **Chemicals:** eps (MESH:C100219), thymine (MESH:D013941), guanine (MESH:D006147), C (MESH:D002244), cytosine (MESH:D003596), nucleotide (MESH:D009711), Adenine (MESH:D000225), DBSCAN (-), Guanine nucleotide (MESH:D006150), dinucleotide (MESH:D015226)
- **Species:** Escherichia coli (E. coli, species) [taxon 562], Escherichia coli K-12 (strain) [taxon 83333], Homo sapiens (human, species) [taxon 9606]
- **Cell lines:** MG1655 — Homo sapiens (Human), Maple syrup urine disease, Transformed cell line (CVCL_D514)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12917692/full.md

## Figures

10 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12917692/full.md

## References

13 references — full list in the complete paper: https://tomesphere.com/paper/PMC12917692/full.md

---
Source: https://tomesphere.com/paper/PMC12917692