iEnhancer-ELM: improve enhancer identification by extracting position-related multiscale contextual information based on enhancer language models
Jiahao Li, Zhourun Wu, Wenhao Lin, Jiawei Luo, Jun Zhang, Qingcai Chen, and Junjie Chen

TL;DR
iEnhancer-ELM introduces a BERT-like model that leverages multi-scale k-mer tokenization and attention mechanisms to effectively identify enhancers by capturing position-related multiscale contextual information from DNA sequences.
Contribution
The paper presents a novel enhancer identification method using enhancer language models with multi-scale k-mers and attention, outperforming existing methods and providing biological interpretability.
Findings
Outperforms state-of-the-art enhancer identification methods on benchmark datasets.
Identifies 30 enhancer motifs, with 12 verified by established databases.
Demonstrates interpretability and potential biological insights of the model.
Abstract
Motivation: Enhancers are important cis-regulatory elements that regulate a wide range of biological functions and enhance the transcription of target genes. Although many feature extraction methods have been proposed to improve the performance of enhancer identification, they cannot learn position-related multiscale contextual information from raw DNA sequences. Results: In this article, we propose a novel enhancer identification method (iEnhancer-ELM) based on BERT-like enhancer language models. iEnhancer-ELM tokenizes DNA sequences with multi-scale k-mers and extracts contextual information of different scale k-mers related with their positions via an multi-head attention mechanism. We first evaluate the performance of different scale k-mers, then ensemble them to improve the performance of enhancer identification. The experimental results on two popular benchmark datasets show…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRNA and protein synthesis mechanisms · Antimicrobial Peptides and Activities · Genomics and Chromatin Dynamics
MethodsSoftmax · Linear Layer · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
