TFBS-Finder: Deep Learning-based Model with DNABERT and Convolutional Networks to Predict Transcription Factor Binding Sites
Nimisha Ghosh, Pratik Dutta, Daniele Santoni

TL;DR
TFBS-Finder is a deep learning model combining DNABERT embeddings with CNN and attention modules, achieving superior accuracy in predicting transcription factor binding sites across multiple datasets.
Contribution
It introduces a novel deep learning architecture integrating DNABERT, CNN, and attention mechanisms for improved TFBS prediction.
Findings
Outperforms existing models in TFBS prediction accuracy
Effective in cross-cell line validation
Demonstrates the benefit of combining long-term dependencies with local features
Abstract
Transcription factors are proteins that regulate the expression of genes by binding to specific genomic regions known as Transcription Factor Binding Sites (TFBSs), typically located in the promoter regions of those genes. Accurate prediction of these binding sites is essential for understanding the complex gene regulatory networks underlying various cellular functions. In this regard, many deep learning models have been developed for such prediction, but there is still scope of improvement. In this work, we have developed a deep learning model which uses pre-trained DNABERT, a Convolutional Neural Network (CNN) module, a Modified Convolutional Block Attention Module (MCBAM), a Multi-Scale Convolutions with Attention (MSCA) module and an output module. The pre-trained DNABERT is used for sequence embedding, thereby capturing the long-term dependencies in the DNA sequences while the CNN,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics
MethodsSoftmax · Attention Is All You Need
