Probing Pre-Trained Language Models for Disease Knowledge
Israa Alghanmi, Luis Espinosa-Anke, Steven Schockaert

TL;DR
This paper introduces DisKnE, a new benchmark for evaluating disease knowledge and reasoning in pre-trained language models, revealing their limited medical reasoning capabilities.
Contribution
The paper presents DisKnE, a novel benchmark with adversarial negative examples and disease-specific splits to better assess medical reasoning in language models.
Findings
Pre-trained models perform poorly on the new benchmark.
Standard benchmarks may overestimate models' reasoning abilities.
DisKnE highlights the need for improved medical reasoning evaluation.
Abstract
Pre-trained language models such as ClinicalBERT have achieved impressive results on tasks such as medical Natural Language Inference. At first glance, this may suggest that these models are able to perform medical reasoning tasks, such as mapping symptoms to diseases. However, we find that standard benchmarks such as MedNLI contain relatively few examples that require such forms of reasoning. To better understand the medical reasoning capabilities of existing language models, in this paper we introduce DisKnE, a new benchmark for Disease Knowledge Evaluation. To construct this benchmark, we annotated each positive MedNLI example with the types of medical reasoning that are needed. We then created negative examples by corrupting these positive examples in an adversarial way. Furthermore, we define training-test splits per disease, ensuring that no knowledge about test diseases can be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Machine Learning in Healthcare · Natural Language Processing Techniques
