Discovery and information-theoretic characterization of transcription factor binding sites that act cooperatively
Jacob Clifford, Christoph Adami

TL;DR
This paper introduces a probabilistic model for transcription factor binding sites that accounts for cooperative interactions with nearby sites, improving prediction accuracy and quantifying information content in DNA sequences.
Contribution
It develops conditional PWMs based on flanking sequences, revealing cooperative binding patterns and enhancing detection of transcription factor sites.
Findings
Dorsal sites with Twist cooperation contain about 0.5 bits of information.
Conditional models outperform traditional PWMs in predicting binding sites.
Cooperative binding sites exhibit measurable information content.
Abstract
Transcription factor binding to the surface of DNA regulatory regions is one of the primary causes of regulating gene expression levels. A probabilistic approach to model protein-DNA interactions at the sequence level is through Position Weight Matrices (PWMs) that estimate the joint probability of a DNA binding site sequence by assuming positional independence within the DNA sequence. Here we construct conditional PWMs that depend on the motif signatures in the flanking DNA sequence, by conditioning known binding site loci on the presence or absence of additional binding sites in the flanking sequence of each site's locus. Pooling known sites with similar flanking sequence patterns allows for the estimation of the conditional distribution function over the binding site sequences. We apply our model to the Dorsal transcription factor binding sites active in patterning the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
