An Evaluation of Large Language Models in Bioinformatics Research
Hengchuang Yin, Zhonghui Gu, Fanhao Wang, Yiparemu Abuduhaibaier,, Yanqiao Zhu, Xinming Tu, Xian-Sheng Hua, Xiao Luo, Yizhou Sun

TL;DR
This paper evaluates the capabilities of large language models like GPT in performing various bioinformatics tasks, highlighting their successes and limitations to guide future research in AI-driven biological data analysis.
Contribution
It provides a comprehensive assessment of LLMs in bioinformatics, demonstrating their potential and identifying current limitations for complex biological tasks.
Findings
LLMs can successfully perform many bioinformatics tasks with proper prompts.
Limitations exist for complex and specialized bioinformatics problems.
The study offers insights for future AI applications in bioinformatics.
Abstract
Large language models (LLMs) such as ChatGPT have gained considerable interest across diverse research communities. Their notable ability for text completion and generation has inaugurated a novel paradigm for language-interfaced problem solving. However, the potential and efficacy of these models in bioinformatics remain incompletely explored. In this work, we study the performance LLMs on a wide spectrum of crucial bioinformatics tasks. These tasks include the identification of potential coding regions, extraction of named entities for genes and proteins, detection of antimicrobial and anti-cancer peptides, molecular optimization, and resolution of educational bioinformatics problems. Our findings indicate that, given appropriate prompts, LLMs like GPT variants can successfully handle most of these tasks. In addition, we provide a thorough analysis of their limitations in the context…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenetics, Bioinformatics, and Biomedical Research
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Linear Layer · Cosine Annealing · Residual Connection · Weight Decay · Linear Warmup With Cosine Annealing · Byte Pair Encoding · Dropout · Attention Dropout · Dense Connections
