LLMs are not Zero-Shot Reasoners for Biomedical Information Extraction

Aishik Nagar; Viktor Schlegel; Thanh-Tung Nguyen; Hao Li; Yuping Wu; Kuluhan Binici; Stefan Winkler

arXiv:2408.12249·cs.CL·May 20, 2025

LLMs are not Zero-Shot Reasoners for Biomedical Information Extraction

Aishik Nagar, Viktor Schlegel, Thanh-Tung Nguyen, Hao Li, Yuping Wu, Kuluhan Binici, Stefan Winkler

PDF

Open Access

TL;DR

This paper systematically benchmarks large language models on biomedical information extraction tasks, revealing that standard prompting outperforms complex reasoning techniques and highlighting the need for better external knowledge integration.

Contribution

It provides a comprehensive evaluation of LLMs on biomedical tasks, showing limitations of current reasoning methods and emphasizing the importance of external knowledge integration.

Findings

01

Standard prompting outperforms Chain of-Thought and RAG techniques.

02

Complex reasoning methods do not improve biomedical extraction performance.

03

External knowledge integration remains a key challenge.

Abstract

Large Language Models (LLMs) are increasingly adopted for applications in healthcare, reaching the performance of domain experts on tasks such as question answering and document summarisation. Despite their success on these tasks, it is unclear how well LLMs perform on tasks that are traditionally pursued in the biomedical domain, such as structured information extraction. To bridge this gap, in this paper, we systematically benchmark LLM performance in Medical Classification and Named Entity Recognition (NER) tasks. We aim to disentangle the contribution of different factors to the performance, particularly the impact of LLMs' task knowledge and reasoning capabilities, their (parametric) domain knowledge, and addition of external knowledge. To this end, we evaluate various open LLMs - including BioMistral and Llama-2 models - on a diverse set of biomedical datasets, using standard…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBiomedical Text Mining and Ontologies

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Sparse Evolutionary Training · Linear Layer · WordPiece · Residual Connection · Multi-Head Attention · Linear Warmup With Linear Decay · Attention Dropout · Adam