# Sequence-based modeling of low-affinity transcription factor–DNA binding through deep learning

**Authors:** Yingfei Wang, Jinsen Li, Tsu-Pei Chiu, Beibei Xin, Remo Rohs

PMC · DOI: 10.1093/nargab/lqag027 · 2026-03-05

## TL;DR

This paper uses deep learning to model how transcription factors bind to DNA at low-affinity sites, improving understanding of gene regulation.

## Contribution

The study introduces and evaluates reverse-complement weight-sharing and data augmentation strategies for modeling low-affinity TF-DNA binding.

## Key findings

- Reverse-complement weight-sharing CNN models and augmented SA models outperformed other approaches in modeling TF-DNA binding.
- In silico mutagenesis (ISM) was found to be less sensitive to model hyperparameters compared to other interpretation methods.
- Exd-Ubx binding at low-affinity sites was identified, suggesting possible biophysical mechanisms.

## Abstract

Multiple layers of molecular determinants and mechanisms affect binding specificity between transcription factors (TFs) and DNA. DNA sequence-based deep learning models using convolutional neural networks (CNNs) and self-attention (SA) transformers have improved modeling accuracy and advanced our understanding of TF–DNA binding specificity through network interpretation. However, the systematic evaluation of various strategies for handling DNA sequence orientations in deep learning models—and their interpretation—remains underexplored, especially in the context of learning low-affinity binding site specificity. Using SELEX-seq data for eight Exd-Hox heterodimers in Drosophila, we compared canonical models with data augmentation and reverse-complement weight-sharing models. We found that reverse-complement weight-sharing CNN models and SA models trained with augmented data with reverse complements outperformed other approaches in modeling binding specificity. In this work, we evaluated several interpretation methods, including Gradient*input, DeconvNet, DeepLIFT, and in silico mutagenesis (ISM). Compared to other interpretation methods, ISM was less sensitive to model hyperparameter settings. In this work, we identified Exd-Ubx binding at low-affinity sites and suggested possible biophysical mechanisms. The findings of this study will be relevant for studying the functional role of low-affinity TF binding in gene regulatory mechanisms with possible implications on TF–DNA binding specificity guided protein design.

Graphical Abstract

## Linked entities

- **Species:** Drosophila (taxon 7215)

## Full-text entities

- **Genes:** exd (extradenticle) [NCBI Gene 32567] {aka CG8933, DExd, Dm-EXD, Dmel\CG8933, Dpbx, Pbx1}, Ubx (Ultrabithorax) [NCBI Gene 42034] {aka BX-C, Bxl, CG10388, Cbx, DUbx, Dm Ubx}, tin (tinman) [NCBI Gene 42536] {aka CG7895, DROHOXHK4, DROHOXNK4, DmNK-4, Dmel\CG7895, HOX}, ovo (ovo) [NCBI Gene 31429] {aka CG15467, CG6824, Dmel\CG6824, Fs(1)K1103, Fs(1)K1237, Fs(1)K155}, Abd-B (Abdominal B) [NCBI Gene 47763] {aka 9, ABDB, Abd B, Abd0B, AbdB, AbdB(CA)[[26]]}
- **Diseases:** SA (MESH:D001289)
- **Chemicals:** SA (-), nucleotide (MESH:D009711), adenine (MESH:D000225)
- **Species:** Drosophila melanogaster (fruit fly, species) [taxon 7227]

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12961433/full.md

---
Source: https://tomesphere.com/paper/PMC12961433