# Linking hierarchical classification of transcription factors by the structure of their DNA-binding domains to the variability of their binding site motifs

**Authors:** V.G. Levitsky, T.Yu. Vatolina, V.V. Raditsa

PMC · DOI: 10.18699/vjgb-25-99 · Vavilov Journal of Genetics and Breeding · 2025-12-01

## TL;DR

This paper explores how the structure of DNA-binding domains in transcription factors relates to the similarity of their binding site motifs, aiming to improve the identification of transcription factors from sequencing data.

## Contribution

The study introduces a method to refine TF classification by incorporating motif similarity into the existing structural hierarchy.

## Key findings

- TFs with similar DNA-binding domain structures often have similar binding site motifs.
- Grouping TFs into motif-similar branches improves the identification of TFs from ChIP-seq data.
- Motif similarity varies significantly across different hierarchical levels of TF classification.

## Abstract

De novo motif search is the main approach for determining the nucleotide specificity of binding of the key regulators of gene transcription, transcription factors (TFs), based on data from massive genome-wide sequencing of their binding site regions in vivo, such as ChIP-seq. The number of motifs of known TF binding sites (TFBSs) has increased several times in recent years. Due to the similarity in the structure of the DNA-binding domains of TFs, many structurally cognate TFs have similar and sometimes almost indistinguishable binding site motifs. The classification of TFs by the structure of the DNA-binding domains from the TFClass database defines the top levels of the hierarchy (superclasses and classes of TFs) by the structure of these domains, and the next levels (families and subfamilies of TFs) by the alignments of amino acid sequences of domains. However, this classification does not take into account the similarity of TFBS motifs, whereas identification of valid TFs from massive sequencing data of TFBSs, such as ChIP- seq, requires working with TFBS motifs rather than TFs themselves. Therefore, in this study we extracted from the Hocomoco and Jaspar databases the TFBS motifs for human and fruit fly Drosophila melanogaster, and considered the pairwise similarity of binding site motifs of cognate TFs according to their classification from the TFClass database. We have shown that the common tree of the TF hierarchy by the structure of DNA-binding domains can be split into separate branches representing non-overlapping sets of TFs. Within each branch, the majority of TF pairs have significantly similar binding site motifs. Each branch can include one or more sister elementary units of the hierarchy and all its/their lower levels: one or more TFs of the same subfamily, or the whole subfamily, one or several subfamilies of the same family, an entire family, etc., up to the entire class. Analysis of the seven largest human and two largest Drosophila TF classes showed that the similarity of TFs in terms of TFBS motifs for different corresponding levels (classes, families) is noticeably different. Supplementing the hierarchical classification of TFs with branches combining significantly similar motifs of TFBSs can increase the efficiency of identifying involved TFs through enriched motifs detected by de novo motif search for massive sequencing data of TFBSs from the ChIP-seq technology.

## Linked entities

- **Species:** Homo sapiens (taxon 9606), Drosophila melanogaster (taxon 7227)

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606], Drosophila melanogaster (fruit fly, species) [taxon 7227]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12795858/full.md

## Figures

13 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12795858/full.md

---
Source: https://tomesphere.com/paper/PMC12795858