Advancing bioinformatics with large language models: components, applications and perspectives
Jiajia Liu, Mengyuan Yang, Yankai Yu, Haixia Xu, Tiangang Wang, Kang, Li, Xiaobo Zhou

TL;DR
This review explores how large language models are transforming bioinformatics by detailing their components, applications across various biological data types, and offering practical guidance for their effective use and development.
Contribution
It provides a comprehensive overview of LLM components, applications in bioinformatics, and practical strategies, highlighting their potential beyond natural language processing.
Findings
LLMs are effective in genomics, transcriptomics, proteomics, drug discovery, and single-cell analysis.
Current foundation models enable diverse bioinformatics applications.
Guidelines are provided for optimizing LLM use and fostering innovation.
Abstract
Large language models (LLMs) are a class of artificial intelligence models based on deep learning, which have great performance in various tasks, especially in natural language processing (NLP). Large language models typically consist of artificial neural networks with numerous parameters, trained on large amounts of unlabeled input using self-supervised or semi-supervised learning. However, their potential for solving bioinformatics problems may even exceed their proficiency in modeling human language. In this review, we will provide a comprehensive overview of the essential components of large language models (LLMs) in bioinformatics, spanning genomics, transcriptomics, proteomics, drug discovery, and single-cell analysis. Key aspects covered include tokenization methods for diverse data types, the architecture of transformer models, the core attention mechanism, and the pre-training…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRNA modifications and cancer · Machine Learning in Bioinformatics · Topic Modeling
MethodsMulti-Head Attention · Attention Is All You Need · Cosine Annealing · Discriminative Fine-Tuning · Linear Layer · Attention Dropout · Dropout · Adam · Layer Normalization · Residual Connection
