Informational blueprints reveal condition-dependent gene regulatory architectures
Doruk Efe G\"okmen, Rosalind Wenshan Pan, Tom R\"oschinger, Stephen Quake, Hernan Garcia, Rob Phillips, Vincenzo Vitelli

TL;DR
This paper introduces an information blueprint algorithm that identifies condition-dependent transcription factor binding sites in genomes by compressing and analyzing global sequence information, validated on E. coli data.
Contribution
The study presents a novel computational method inspired by renormalisation-group techniques to detect gene regulatory elements across different environmental conditions.
Findings
Successfully identified known TF binding sites in E. coli.
Discovered novel regulatory elements responsive to environmental changes.
Validated the approach's scalability across multiple growth conditions.
Abstract
While coding regions in the genome have a direct interpretation in terms of protein products, significant fractions are non-coding and yet control essential biological functions. Unlike the genetic code, there is no "lookup table" that identifies where regulatory proteins, known as transcription factors (TFs), bind. Here, we extract these binding sites by distilling sequences of nucleotide letters into collective coordinates (hyperletters) representing the binding sites that are active under specific environmental conditions. Going beyond local information footprints between individual bases and expression levels, our algorithm compresses the global information by optimising filters that simultaneously scan an entire promoter sequence. Inspired by renormalisation-group techniques, we identify TF binding sites as coarse-grained variables combining groups…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
