Large Language Models for Biomedical Article Classification
Jakub Proboszcz, Pawe{\l} Cichosz

TL;DR
This paper systematically evaluates large language models for biomedical article classification, comparing various configurations and prompting methods, and finds they perform comparably to traditional classifiers in challenging datasets.
Contribution
It provides a comprehensive analysis of LLMs as classifiers in biomedical domains, including prompt types, output processing, and few-shot strategies, with practical recommendations.
Findings
Average PR AUC above 0.4 for zero-shot prompting
Nearly 0.5 PR AUC for few-shot prompting
Performance close to traditional classifiers like Naive Bayes and Random Forest
Abstract
This work presents a systematic and in-depth investigation of the utility of large language models as text classifiers for biomedical article classification. The study uses several small and mid-size open source models, as well as selected closed source ones, and is more comprehensive than most prior work with respect to the scope of evaluated configurations: different types of prompts, output processing methods for generating both class and class probability predictions, as well as few-shot example counts and selection methods. The performance of the most successful configurations is compared to that of conventional classification algorithms. The obtained average PR AUC over 15 challenging datasets above 0.4 for zero-shot prompting and nearly 0.5 for few-shot prompting comes close to that of the na\"ive Bayes classifier (0.5), the random forest algorithm (0.5 with default settings or…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Topic Modeling · Text and Document Classification Technologies
