# TFProtBert: Detection of Transcription Factors Binding to Methylated DNA Using ProtBert Latent Space Representation

**Authors:** Saima Gaffar, Kil To Chong, Hilal Tayara

PMC · DOI: 10.3390/ijms26094234 · 2025-04-29

## TL;DR

This paper introduces a computational model using ProtBert to identify transcription factors and those that bind to methylated DNA, improving on existing methods.

## Contribution

A novel two-layer SVM framework using ProtBert's latent space for TF and TFPM detection, outperforming current approaches.

## Key findings

- The model reliably predicts transcription factors and those that bind to methylated DNA.
- It outperforms state-of-the-art methods in both balanced and imbalanced datasets.
- Performance is validated through cross-validation and independent testing.

## Abstract

Transcription factors (TFs) are fundamental regulators of gene expression and perform diverse functions in cellular processes. The management of 3-dimensional (3D) genome conformation and gene expression relies primarily on TFs. TFs are crucial regulators of gene expression, performing various roles in biological processes. They attract transcriptional machinery to the enhancers or promoters of specific genes, thereby activating or inhibiting transcription. Identifying these TFs is a significant step towards understanding cellular gene expression mechanisms. Due to the time-consuming and labor-intensive nature of experimental methods, the development of computational models is essential. In this work, we introduced a two-layer prediction framework based on a support vector machine (SVM) using the latent space representation of a protein language model, ProtBert. The first layer of the method reliably predicts and identifies transcription factors (TFs), and in the second layer, the proposed method predicts and identifies transcription factors that prefer binding to methylated deoxyribonucleic acid (TFPMs). In addition, we also tested the proposed method on an imbalanced database. In detecting TFs and TFPMs, the proposed model consistently outperformed state-of-the-art approaches, as demonstrated by performance comparisons via empirical cross-validation analysis and independent tests.

## Linked entities

- **Proteins:** tf.S (transferrin S homeolog)
- **Chemicals:** deoxyribonucleic acid (PubChem CID 44135672)

## Full-text entities

- **Chemicals:** TFProtBert (-)

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12071566/full.md

---
Source: https://tomesphere.com/paper/PMC12071566