Leveraging Natural Language Processing to Unravel the Mystery of Life: A Review of NLP Approaches in Genomics, Transcriptomics, and Proteomics
Ella Rannon, David Burstein

TL;DR
This review discusses how NLP techniques, from classic to transformer-based models, are being adapted to analyze biological sequences in genomics, transcriptomics, and proteomics, revealing new insights into biological data.
Contribution
It provides a comprehensive overview of NLP methods applied to biological sequences, evaluating their strengths, limitations, and potential for future bioinformatics applications.
Findings
NLP models are increasingly used for biological sequence analysis.
Transformer models improve accuracy in gene and structure prediction.
NLP approaches enable large-scale genomic data interpretation.
Abstract
Natural Language Processing (NLP) has transformed various fields beyond linguistics by applying techniques originally developed for human language to the analysis of biological sequences. This review explores the application of NLP methods to biological sequence data, focusing on genomics, transcriptomics, and proteomics. We examine how various NLP methods, from classic approaches like word2vec to advanced models employing transformers and hyena operators, are being adapted to analyze DNA, RNA, protein sequences, and entire genomes. The review also examines tokenization strategies and model architectures, evaluating their strengths, limitations, and suitability for different biological tasks. We further cover recent advances in NLP applications for biological data, such as structure prediction, gene expression, and evolutionary analysis, highlighting the potential of these methods for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMisinformation and Its Impacts · Genetics, Bioinformatics, and Biomedical Research · Topic Modeling
