TFProtBert: Detection of Transcription Factors Binding to Methylated DNA Using ProtBert Latent Space Representation
Saima Gaffar, Kil To Chong, Hilal Tayara

TL;DR
This paper introduces a computational model using ProtBert to identify transcription factors and those that bind to methylated DNA, improving on existing methods.
Contribution
A novel two-layer SVM framework using ProtBert's latent space for TF and TFPM detection, outperforming current approaches.
Findings
The model reliably predicts transcription factors and those that bind to methylated DNA.
It outperforms state-of-the-art methods in both balanced and imbalanced datasets.
Performance is validated through cross-validation and independent testing.
Abstract
Transcription factors (TFs) are fundamental regulators of gene expression and perform diverse functions in cellular processes. The management of 3-dimensional (3D) genome conformation and gene expression relies primarily on TFs. TFs are crucial regulators of gene expression, performing various roles in biological processes. They attract transcriptional machinery to the enhancers or promoters of specific genes, thereby activating or inhibiting transcription. Identifying these TFs is a significant step towards understanding cellular gene expression mechanisms. Due to the time-consuming and labor-intensive nature of experimental methods, the development of computational models is essential. In this work, we introduced a two-layer prediction framework based on a support vector machine (SVM) using the latent space representation of a protein language model, ProtBert. The first layer of the…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics · Genomics and Phylogenetic Studies · RNA and protein synthesis mechanisms
