Large Language Models for Bioinformatics

Wei Ruan; Yanjun Lyu; Jing Zhang; Jiazhang Cai; Peng Shu; Yang Ge; Yao; Lu; Shang Gao; Yue Wang; Peilong Wang; Lin Zhao; Tao Wang; Yufang Liu; Luyang; Fang; Ziyu Liu; Zhengliang Liu; Yiwei Li; Zihao Wu; Junhao Chen; Hanqi Jiang,; Yi Pan; Zhenyuan Yang; Jingyuan Chen; Shizhe Liang; Wei Zhang; Terry Ma; Yuan; Dou; Jianli Zhang; Xinyu Gong; Qi Gan; Yusong Zou; Zebang Chen; Yuanxin Qian,; Shuo Yu; Jin Lu; Kenan Song; Xianqiao Wang; Andrea Sikora; Gang Li; Xiang Li,; Quanzheng Li; Yingfeng Wang; Lu Zhang; Yohannes Abate; Lifang He; Wenxuan; Zhong; Rongjie Liu; Chao Huang; Wei Liu; Ye Shen; Ping Ma; Hongtu Zhu; Yajun; Yan; Dajiang Zhu; Tianming Liu

arXiv:2501.06271·q-bio.QM·January 14, 2025

Large Language Models for Bioinformatics

Wei Ruan, Yanjun Lyu, Jing Zhang, Jiazhang Cai, Peng Shu, Yang Ge, Yao, Lu, Shang Gao, Yue Wang, Peilong Wang, Lin Zhao, Tao Wang, Yufang Liu, Luyang, Fang, Ziyu Liu, Zhengliang Liu, Yiwei Li, Zihao Wu, Junhao Chen, Hanqi Jiang,, Yi Pan, Zhenyuan Yang, Jingyuan Chen

PDF

TL;DR

This survey reviews the development, applications, and challenges of bioinformatics-specific large language models (BioLMs), emphasizing their transformative potential in disease diagnosis, drug discovery, and vaccine development.

Contribution

It provides a comprehensive analysis of BioLMs' evolution, classification, training, applications, challenges, and future directions in bioinformatics.

Findings

01

BioLMs are increasingly used in disease diagnosis, drug discovery, and vaccine development.

02

Key challenges include data privacy, interpretability, biases, and domain adaptation.

03

Emerging trends point to more sophisticated biological and clinical applications.

Abstract

With the rapid advancements in large language model (LLM) technology and the emergence of bioinformatics-specific language models (BioLMs), there is a growing need for a comprehensive analysis of the current landscape, computational characteristics, and diverse applications. This survey aims to address this need by providing a thorough review of BioLMs, focusing on their evolution, classification, and distinguishing features, alongside a detailed examination of training methodologies, datasets, and evaluation frameworks. We explore the wide-ranging applications of BioLMs in critical areas such as disease diagnosis, drug discovery, and vaccine development, highlighting their impact and transformative potential in bioinformatics. We identify key challenges and limitations inherent in BioLMs, including data privacy and security concerns, interpretability issues, biases in training data and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.