LLM Robustness Against Misinformation in Biomedical Question Answering
Alexander Bondarenko, Adrian Viehweger

TL;DR
This study evaluates the robustness of four large language models in biomedical QA against misinformation, examining their accuracy and vulnerability to prompt-injection attacks in various contexts.
Contribution
It provides a comprehensive analysis of LLMs' accuracy and robustness against misinformation and adversarial attacks in biomedical question answering.
Findings
Llama 3.1 achieves the highest accuracy in vanilla and perfect RAG scenarios.
Perfect RAG reduces accuracy gaps between models, enhancing robustness.
Llama 3.1 is the most effective adversary in generating malicious context.
Abstract
The retrieval-augmented generation (RAG) approach is used to reduce the confabulation of large language models (LLMs) for question answering by retrieving and providing additional context coming from external knowledge sources (e.g., by adding the context to the prompt). However, injecting incorrect information can mislead the LLM to generate an incorrect answer. In this paper, we evaluate the effectiveness and robustness of four LLMs against misinformation - Gemma 2, GPT-4o-mini, Llama~3.1, and Mixtral - in answering biomedical questions. We assess the answer accuracy on yes-no and free-form questions in three scenarios: vanilla LLM answers (no context is provided), "perfect" augmented generation (correct context is provided), and prompt-injection attacks (incorrect context is provided). Our results show that Llama 3.1 (70B parameters) achieves the highest accuracy in both vanilla…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Expert finding and Q&A systems · Misinformation and Its Impacts
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Dropout · Dense Connections · Layer Normalization · Residual Connection · Weight Decay · Byte Pair Encoding · Linear Warmup With Linear Decay
