Epigenomic language models powered by Cerebras
Meredith V. Trotter, Cuong Q. Nguyen, Stephen Young, Rob T. Woodruff,, Kim M. Branson

TL;DR
This paper introduces EBERT, a novel Transformer-based model that integrates DNA sequences and epigenetic data to improve cell type-specific gene regulation predictions, enabled by large-scale pre-training on the human genome with Cerebras hardware.
Contribution
The paper presents the first large-scale pre-trained epigenomic language model that combines DNA and epigenetic data, demonstrating enhanced transfer learning for biological tasks.
Findings
EBERT outperforms previous models on multiple benchmarks.
Inclusion of epigenetic data improves prediction accuracy.
Cerebras hardware enabled training of a complex model at scale.
Abstract
Large scale self-supervised pre-training of Transformer language models has advanced the field of Natural Language Processing and shown promise in cross-application to the biological `languages' of proteins and DNA. Learning effective representations of DNA sequences using large genomic sequence corpuses may accelerate the development of models of gene regulation and function through transfer learning. However, to accurately model cell type-specific gene regulation and function, it is necessary to consider not only the information contained in DNA nucleotide sequences, which is mostly invariant between cell types, but also how the local chemical and structural `epigenetic state' of chromosomes varies between cell types. Here, we introduce a Bidirectional Encoder Representations from Transformers (BERT) model that learns representations based on both DNA sequence and paired epigenetic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics · RNA and protein synthesis mechanisms · RNA modifications and cancer
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Attention Dropout · Linear Warmup With Linear Decay · WordPiece · Weight Decay · Dropout · Label Smoothing
