HealthContradict: Evaluating Biomedical Knowledge Conflicts in Language Models
Boya Zhang, Alban Bornet, Rui Yang, Nan Liu, Douglas Teodoro

TL;DR
This paper introduces HealthContradict, a dataset for evaluating how well biomedical language models handle conflicting information, revealing their reliance on correct context over parametric knowledge.
Contribution
The paper presents a new dataset and evaluation framework for assessing biomedical language models' reasoning over conflicting contexts, highlighting their contextual reliance.
Findings
Models perform better with correct context than with conflicting information.
Fine-tuned biomedical models leverage correct context effectively.
Models struggle to resist incorrect conflicting contexts.
Abstract
How do language models use contextual information to answer health questions? How are their responses impacted by conflicting contexts? We assess the ability of language models to reason over long, conflicting biomedical contexts using HealthContradict, an expert-verified dataset comprising 920 unique instances, each consisting of a health-related question, a factual answer supported by scientific evidence, and two documents presenting contradictory stances. We consider several prompt settings, including correct, incorrect or contradictory context, and measure their impact on model outputs. Compared to existing medical question-answering evaluation benchmarks, HealthContradict provides greater distinctions of language models' contextual reasoning capabilities. Our experiments show that the strength of fine-tuned biomedical language models lies not only in their parametric knowledge from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Machine Learning in Healthcare
