Generating information-dense promoter sequences with optimal string packing
Virgile Andreani, Eric J. South, Mary J. Dunlop, Stefan Klumpp, Stacey D. Finley, Stefan Klumpp, Stacey D. Finley, Stefan Klumpp, Stacey D. Finley

TL;DR
This paper introduces a computational method to design DNA sequences with densely packed binding sites, enabling efficient creation of synthetic promoters for studying gene regulation.
Contribution
The novel contribution is a provably optimal algorithm for packing DNA binding sites into short sequences using integer linear programming.
Findings
The nucleotide String Packing Problem is NP-hard and can be reduced to an Orienteering Problem for efficient solving.
The method packs 20–100 binding sites into 50–300 base pair sequences in seconds with provable optimality.
The approach allows designing bacterial promoters with fixed sequence elements and controls binding site usage frequency.
Abstract
Dense arrangements of binding sites within nucleotide sequences can collectively influence downstream transcription rates or initiate biomolecular interactions. For example, natural promoter regions can harbor many overlapping transcription factor binding sites that influence the rate of transcription initiation. Despite the prevalence of overlapping binding sites in nature, rapid design of nucleotide sequences with many overlapping sites remains a challenge. Here, we show that this is an NP-hard problem, coined here as the nucleotide String Packing Problem (SPP). We then introduce a computational technique that efficiently assembles sets of DNA-protein binding sites into dense, contiguous stretches of double-stranded DNA. For the efficient design of nucleotide sequences spanning hundreds of base pairs, we reduce the SPP to an Orienteering Problem with integer distances, and then…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15
Figure 16
Figure 17
Figure 18
Figure 19
Figure 20Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRNA and protein synthesis mechanisms · Machine Learning in Bioinformatics · Bacteriophages and microbial interactions
