Sequence-based modeling of low-affinity transcription factor–DNA binding through deep learning
Yingfei Wang, Jinsen Li, Tsu-Pei Chiu, Beibei Xin, Remo Rohs

TL;DR
This paper uses deep learning to model how transcription factors bind to DNA at low-affinity sites, improving understanding of gene regulation.
Contribution
The study introduces and evaluates reverse-complement weight-sharing and data augmentation strategies for modeling low-affinity TF-DNA binding.
Findings
Reverse-complement weight-sharing CNN models and augmented SA models outperformed other approaches in modeling TF-DNA binding.
In silico mutagenesis (ISM) was found to be less sensitive to model hyperparameters compared to other interpretation methods.
Exd-Ubx binding at low-affinity sites was identified, suggesting possible biophysical mechanisms.
Abstract
Multiple layers of molecular determinants and mechanisms affect binding specificity between transcription factors (TFs) and DNA. DNA sequence-based deep learning models using convolutional neural networks (CNNs) and self-attention (SA) transformers have improved modeling accuracy and advanced our understanding of TF–DNA binding specificity through network interpretation. However, the systematic evaluation of various strategies for handling DNA sequence orientations in deep learning models—and their interpretation—remains underexplored, especially in the context of learning low-affinity binding site specificity. Using SELEX-seq data for eight Exd-Hox heterodimers in Drosophila, we compared canonical models with data augmentation and reverse-complement weight-sharing models. We found that reverse-complement weight-sharing CNN models and SA models trained with augmented data with reverse…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Chromatin Dynamics · Machine Learning in Bioinformatics · Gene Regulatory Network Analysis
