BiomedRAG: A Retrieval Augmented Large Language Model for Biomedicine

Mingchen Li; Halil Kilicoglu; Hua Xu; Rui Zhang

arXiv:2405.00465·cs.CL·May 6, 2024

BiomedRAG: A Retrieval Augmented Large Language Model for Biomedicine

Mingchen Li, Halil Kilicoglu, Hua Xu, Rui Zhang

PDF

Open Access 1 Repo

TL;DR

BiomedRAG introduces a simple retrieval-augmented approach for biomedical NLP tasks, directly inputting retrieved documents into LLMs to improve accuracy and reduce hallucinations across multiple datasets.

Contribution

It proposes a straightforward method for retrieval-augmented LLMs in biomedicine that bypasses complex mechanisms and enables LLM supervision of retrieval models, enhancing performance.

Findings

01

Achieves superior performance on 5 biomedical NLP tasks.

02

Outperforms existing triple extraction systems with high micro-F1 scores.

03

Effectively reduces noise in retrieved documents during tasks.

Abstract

Large Language Models (LLMs) have swiftly emerged as vital resources for different applications in the biomedical and healthcare domains; however, these models encounter issues such as generating inaccurate information or hallucinations. Retrieval-augmented generation provided a solution for these models to update knowledge and enhance their performance. In contrast to previous retrieval-augmented LMs, which utilize specialized cross-attention mechanisms to help LLM encode retrieved text, BiomedRAG adopts a simpler approach by directly inputting the retrieved chunk-based documents into the LLM. This straightforward design is easily applicable to existing retrieval and language models, effectively bypassing noise information in retrieved documents, particularly in noise-intensive tasks. Moreover, we demonstrate the potential for utilizing the LLM to supervise the retrieval model in the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

toneli/petailor-for-bio-triple-extraction
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBiomedical Text Mining and Ontologies · Topic Modeling