Benchmarking large language models for biomedical natural language processing applications and recommendations
Qingyu Chen, Yan Hu, Xueqing Peng, Qianqian Xie, Qiao Jin, Aidan, Gilson, Maxwell B. Singer, Xuguang Ai, Po-Ting Lai, Zhizheng Wang, Vipina, Kuttichi Keloth, Kalpana Raja, Jiming Huang, Huan He, Fongci Lin, Jingcheng, Du, Rui Zhang, W. Jim Zheng, Ron A. Adelman, Zhiyong Lu

TL;DR
This paper systematically evaluates large language models in biomedical NLP tasks, comparing their performance with traditional models, and provides practical insights and recommendations for their application in the biomedical domain.
Contribution
It offers a comprehensive benchmark of LLMs in BioNLP, highlighting their strengths, limitations, and the importance of fine-tuning for optimal performance.
Findings
Traditional fine-tuning outperforms zero or few-shot LLMs in most tasks.
GPT-4 excels in reasoning-related biomedical tasks.
LLMs exhibit issues like hallucinations and missing information.
Abstract
The rapid growth of biomedical literature poses challenges for manual knowledge curation and synthesis. Biomedical Natural Language Processing (BioNLP) automates the process. While Large Language Models (LLMs) have shown promise in general domains, their effectiveness in BioNLP tasks remains unclear due to limited benchmarks and practical guidelines. We perform a systematic evaluation of four LLMs, GPT and LLaMA representatives on 12 BioNLP benchmarks across six applications. We compare their zero-shot, few-shot, and fine-tuning performance with traditional fine-tuning of BERT or BART models. We examine inconsistencies, missing information, hallucinations, and perform cost analysis. Here we show that traditional fine-tuning outperforms zero or few shot LLMs in most tasks. However, closed-source LLMs like GPT-4 excel in reasoning-related tasks such as medical question answering. Open…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Biomedical Text Mining and Ontologies · Natural Language Processing Techniques
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Multi-Head Attention · Attention Is All You Need · Label Smoothing · Absolute Position Encodings · Adam · Position-Wise Feed-Forward Layer · Dense Connections · Transformer
