LLM Robustness Against Misinformation in Biomedical Question Answering

Alexander Bondarenko; Adrian Viehweger

arXiv:2410.21330·cs.CL·October 30, 2024

LLM Robustness Against Misinformation in Biomedical Question Answering

Alexander Bondarenko, Adrian Viehweger

PDF

Open Access 1 Repo

TL;DR

This study evaluates the robustness of four large language models in biomedical QA against misinformation, examining their accuracy and vulnerability to prompt-injection attacks in various contexts.

Contribution

It provides a comprehensive analysis of LLMs' accuracy and robustness against misinformation and adversarial attacks in biomedical question answering.

Findings

01

Llama 3.1 achieves the highest accuracy in vanilla and perfect RAG scenarios.

02

Perfect RAG reduces accuracy gaps between models, enhancing robustness.

03

Llama 3.1 is the most effective adversary in generating malicious context.

Abstract

The retrieval-augmented generation (RAG) approach is used to reduce the confabulation of large language models (LLMs) for question answering by retrieving and providing additional context coming from external knowledge sources (e.g., by adding the context to the prompt). However, injecting incorrect information can mislead the LLM to generate an incorrect answer. In this paper, we evaluate the effectiveness and robustness of four LLMs against misinformation - Gemma 2, GPT-4o-mini, Llama~3.1, and Mixtral - in answering biomedical questions. We assess the answer accuracy on yes-no and free-form questions in three scenarios: vanilla LLM answers (no context is provided), "perfect" augmented generation (correct context is provided), and prompt-injection attacks (incorrect context is provided). Our results show that Llama 3.1 (70B parameters) achieves the highest accuracy in both vanilla…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

alebondarenko/llm-robustness
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Expert finding and Q&A systems · Misinformation and Its Impacts

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Dropout · Dense Connections · Layer Normalization · Residual Connection · Weight Decay · Byte Pair Encoding · Linear Warmup With Linear Decay