A Comprehensive Evaluation of Large Language Models on Benchmark Biomedical Text Processing Tasks
Israt Jahan, Md Tahmid Rahman Laskar, Chun Peng, Jimmy Huang

TL;DR
This study evaluates the performance of large language models on biomedical tasks, revealing their potential in low-data scenarios and highlighting variability across models and tasks, despite not surpassing fine-tuned models.
Contribution
First comprehensive evaluation of LLMs on biomedical datasets, comparing their zero-shot capabilities to fine-tuned models across diverse tasks.
Findings
Zero-shot LLMs outperform fine-tuned models on small datasets.
Performance varies significantly among different LLMs and tasks.
LLMs show potential for biomedical tasks with limited annotated data.
Abstract
Recently, Large Language Models (LLM) have demonstrated impressive capability to solve a wide range of tasks. However, despite their success across various tasks, no prior work has investigated their capability in the biomedical domain yet. To this end, this paper aims to evaluate the performance of LLMs on benchmark biomedical tasks. For this purpose, we conduct a comprehensive evaluation of 4 popular LLMs in 6 diverse biomedical tasks across 26 datasets. To the best of our knowledge, this is the first work that conducts an extensive evaluation and comparison of various LLMs in the biomedical domain. Interestingly, we find based on our evaluation that in biomedical datasets that have smaller training sets, zero-shot LLMs even outperform the current state-of-the-art fine-tuned biomedical models. This suggests that pretraining on large text corpora makes LLMs quite specialized even in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education
