A Comprehensive Evaluation of Large Language Models on Benchmark   Biomedical Text Processing Tasks

Israt Jahan; Md Tahmid Rahman Laskar; Chun Peng; Jimmy Huang

arXiv:2310.04270·cs.CL·February 21, 2024·1 cites

A Comprehensive Evaluation of Large Language Models on Benchmark Biomedical Text Processing Tasks

Israt Jahan, Md Tahmid Rahman Laskar, Chun Peng, Jimmy Huang

PDF

Open Access 1 Repo

TL;DR

This study evaluates the performance of large language models on biomedical tasks, revealing their potential in low-data scenarios and highlighting variability across models and tasks, despite not surpassing fine-tuned models.

Contribution

First comprehensive evaluation of LLMs on biomedical datasets, comparing their zero-shot capabilities to fine-tuned models across diverse tasks.

Findings

01

Zero-shot LLMs outperform fine-tuned models on small datasets.

02

Performance varies significantly among different LLMs and tasks.

03

LLMs show potential for biomedical tasks with limited annotated data.

Abstract

Recently, Large Language Models (LLM) have demonstrated impressive capability to solve a wide range of tasks. However, despite their success across various tasks, no prior work has investigated their capability in the biomedical domain yet. To this end, this paper aims to evaluate the performance of LLMs on benchmark biomedical tasks. For this purpose, we conduct a comprehensive evaluation of 4 popular LLMs in 6 diverse biomedical tasks across 26 datasets. To the best of our knowledge, this is the first work that conducts an extensive evaluation and comparison of various LLMs in the biomedical domain. Interestingly, we find based on our evaluation that in biomedical datasets that have smaller training sets, zero-shot LLMs even outperform the current state-of-the-art fine-tuned biomedical models. This suggests that pretraining on large text corpora makes LLMs quite specialized even in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tahmedge/llm-eval-biomed
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education