Genomic Language Models: Opportunities and Challenges

Gonzalo Benegas; Chengzhong Ye; Carlos Albors; Jianan Canal Li; Yun S.; Song

arXiv:2407.11435·q-bio.GN·September 24, 2024

Genomic Language Models: Opportunities and Challenges

Gonzalo Benegas, Chengzhong Ye, Carlos Albors, Jianan Canal Li, Yun S., Song

PDF

TL;DR

Genomic Language Models (gLMs) are large language models trained on DNA sequences that hold promise for advancing genomic understanding, but face significant development challenges, especially for complex genomes.

Contribution

This paper reviews the opportunities, applications, and challenges of developing effective gLMs for biological sequence analysis.

Findings

01

gLMs can predict functional constraints and aid in sequence design

02

Transfer learning enhances gLM applications in genomics

03

Developing gLMs for complex genomes remains challenging

Abstract

Large language models (LLMs) are having transformative impacts across a wide range of scientific fields, particularly in the biomedical sciences. Just as the goal of Natural Language Processing is to understand sequences of words, a major objective in biology is to understand biological sequences. Genomic Language Models (gLMs), which are LLMs trained on DNA sequences, have the potential to significantly advance our understanding of genomes and how DNA elements at various scales interact to give rise to complex functions. To showcase this potential, we highlight key applications of gLMs, including functional constraint prediction, sequence design, and transfer learning. Despite notable recent progress, however, developing effective and efficient gLMs presents numerous challenges, especially for species with large, complex genomes. Here, we discuss major considerations for developing and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.