Benchmarking Retrieval-Augmented Large Language Models in Biomedical NLP: Application, Robustness, and Self-Awareness

Mingchen Li; Zaifu Zhan; Han Yang; Yongkang Xiao; Jiatan Huang; Rui Zhang

arXiv:2405.08151·cs.CL·November 17, 2025·2 cites

Benchmarking Retrieval-Augmented Large Language Models in Biomedical NLP: Application, Robustness, and Self-Awareness

Mingchen Li, Zaifu Zhan, Han Yang, Yongkang Xiao, Jiatan Huang, Rui Zhang

PDF

Open Access

TL;DR

This paper systematically evaluates retrieval-augmented large language models in biomedical NLP, focusing on their performance, robustness, and self-awareness across multiple tasks and datasets.

Contribution

It introduces a comprehensive evaluation framework and testbeds for assessing RALs' capabilities and robustness in biomedical NLP tasks, addressing gaps in prior research.

Findings

01

RALs improve performance on biomedical NLP tasks.

02

Robustness varies with retrieval methods and task types.

03

Self-awareness aspects influence model reliability.

Abstract

Large language models (LLM) have demonstrated remarkable capabilities in various biomedical natural language processing (NLP) tasks, leveraging the demonstration within the input context to adapt to new tasks. However, LLM is sensitive to the selection of demonstrations. To address the hallucination issue inherent in LLM, retrieval-augmented LLM (RAL) offers a solution by retrieving pertinent information from an established database. Nonetheless, existing research work lacks rigorous evaluation of the impact of retrieval-augmented large language models on different biomedical NLP tasks. This deficiency makes it challenging to ascertain the capabilities of RAL within the biomedical domain. Moreover, the outputs from RAL are affected by retrieving the unlabeled, counterfactual, or diverse knowledge that is not well studied in the biomedical domain. However, such knowledge is common in the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Biomedical Text Mining and Ontologies