EPINTLM: enhancer–promoter prediction with pretrained k-mer embeddings and residual cross-attention
Thi Lan Nguyen, Hien Quang Kha, Phat Ky Nguyen, Minh Huu Nhat Le, Duc-Trong Le, Nguyen Quoc Khanh Le

TL;DR
EPINTLM is a deep learning model that predicts enhancer-promoter interactions using DNA sequences and genomic features with improved accuracy.
Contribution
EPINTLM introduces a novel deep learning framework with cross-attention and residual aggregation for enhancer-promoter interaction prediction.
Findings
EPINTLM achieves competitive AUROC and AUPR performance on a benchmark across six human cell lines.
Ablation studies show cross-attention and residual aggregation are key to model performance.
A unified preprocessing pipeline improves training efficiency and reproducibility.
Abstract
Enhancer–promoter interactions (EPIs) play an important role in gene regulation, yet experimental mapping remains costly and limited in coverage. As a result, computational approaches are commonly evaluated under curated benchmark datasets, which pose challenges related to long-range sequence modeling, multimodal feature integration, and reproducible preprocessing. In this study, we present EPINTLM (Enhancer–Promoter Interaction Nucleotide Transformer Large Model), a deep learning framework designed to investigate architectural strategies for EPI prediction under standardized benchmark settings. EPINTLM integrates DNA sequence representations and genomic features by leveraging pretrained k-mer embeddings from the Nucleotide Transformer and explicitly modeling intra- and inter-sequence dependencies through residual self-attention and bidirectional cross-attention. We additionally…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics · Genomics and Chromatin Dynamics · Cell Image Analysis Techniques
