Accurate prediction of gene expression by integration of DNA sequence statistics with detailed modeling of transcription regulation
Jose M. G. Vilar

TL;DR
This paper presents an integrated method combining DNA sequence statistics with biophysical modeling to accurately predict gene expression levels across various conditions, exemplified on the lac operon.
Contribution
It introduces a middle-ground approach that bridges empirical and biophysical models for gene regulation prediction, enhancing accuracy and understanding.
Findings
Predicts lac operon activity within 0.3-fold accuracy
Achieves reliable gene expression predictions over 10,000-fold range
Integrates statistical sequence data with detailed transcription regulation models
Abstract
Gene regulation involves a hierarchy of events that extend from specific protein-DNA interactions to the combinatorial assembly of nucleoprotein complexes. The effects of DNA sequence on these processes have typically been studied based either on its quantitative connection with single-domain binding free energies or on empirical rules that combine different DNA motifs to predict gene expression trends on a genomic scale. The middle-point approach that quantitatively bridges these two extremes, however, remains largely unexplored. Here, we provide an integrated approach to accurately predict gene expression from statistical sequence information in combination with detailed biophysical modeling of transcription regulation by multidomain binding on multiple DNA sites. For the regulation of the prototypical lac operon, this approach predicts within 0.3-fold accuracy transcriptional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
