iEnhancer-ELM: improve enhancer identification by extracting   position-related multiscale contextual information based on enhancer language   models

Jiahao Li; Zhourun Wu; Wenhao Lin; Jiawei Luo; Jun Zhang; Qingcai Chen; and Junjie Chen

arXiv:2212.01495·q-bio.GN·July 18, 2023

iEnhancer-ELM: improve enhancer identification by extracting position-related multiscale contextual information based on enhancer language models

Jiahao Li, Zhourun Wu, Wenhao Lin, Jiawei Luo, Jun Zhang, Qingcai Chen, and Junjie Chen

PDF

Open Access 1 Repo

TL;DR

iEnhancer-ELM introduces a BERT-like model that leverages multi-scale k-mer tokenization and attention mechanisms to effectively identify enhancers by capturing position-related multiscale contextual information from DNA sequences.

Contribution

The paper presents a novel enhancer identification method using enhancer language models with multi-scale k-mers and attention, outperforming existing methods and providing biological interpretability.

Findings

01

Outperforms state-of-the-art enhancer identification methods on benchmark datasets.

02

Identifies 30 enhancer motifs, with 12 verified by established databases.

03

Demonstrates interpretability and potential biological insights of the model.

Abstract

Motivation: Enhancers are important cis-regulatory elements that regulate a wide range of biological functions and enhance the transcription of target genes. Although many feature extraction methods have been proposed to improve the performance of enhancer identification, they cannot learn position-related multiscale contextual information from raw DNA sequences. Results: In this article, we propose a novel enhancer identification method (iEnhancer-ELM) based on BERT-like enhancer language models. iEnhancer-ELM tokenizes DNA sequences with multi-scale k-mers and extracts contextual information of different scale k-mers related with their positions via an multi-head attention mechanism. We first evaluate the performance of different scale k-mers, then ensemble them to improve the performance of enhancer identification. The experimental results on two popular benchmark datasets show…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

chen-bioinfo/ienhancer-elm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRNA and protein synthesis mechanisms · Antimicrobial Peptides and Activities · Genomics and Chromatin Dynamics

MethodsSoftmax · Linear Layer · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings