DNABERT2-CAMP: A Hybrid Transformer-CNN Model for E. coli Promoter Recognition
Hua-Lin Xu, Xiu-Jun Gong, Hua Yu, Ying-Kai Wang

TL;DR
This paper introduces DNABERT2-CAMP, a new deep learning model that improves the accuracy of identifying promoter regions in E. coli by combining global and local sequence features.
Contribution
The novel hybrid model integrates a pre-trained Transformer with a custom CNN module for enhanced promoter recognition and interpretability.
Findings
DNABERT2-CAMP achieved 93.10% accuracy and 97.28% ROC AUC in cross-validation.
The model maintained strong performance on an independent test set with 89.83% accuracy and 92.79% ROC AUC.
Abstract
Background: Accurate recognition of promoter sequences in Escherichia coli is fundamental for understanding gene regulation and engineering synthetic biological systems. However, existing computational methods struggle to simultaneously model long-range genomic dependencies and fine-grained local motifs, particularly the degenerate −10 and −35 elements of σ70 promoters. To address this gap, we propose DNABERT2-CAMP, a novel hybrid deep learning framework designed to integrate global contextual understanding with high-resolution local motif detection for robust promoter identification. Methods: We constructed a balanced dataset of 8720 experimentally validated and negative 81-bp sequences from RegulonDB, literature, and the E. coli K-12 genome. Our model combines a pre-trained DNABERT-2 Transformer for global sequence encoding with a custom CAMP module (CNN-Attention-Mean Pooling) for…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics · Genomics and Chromatin Dynamics · Bacterial Genetics and Biotechnology
