MedRiskEval: Medical Risk Evaluation Benchmark of Language Models, On the Importance of User Perspectives in Healthcare Settings
Jean-Philippe Corbeil, Minseon Kim, Maxime Griot, Sheela Agarwal, Alessandro Sordoni, Francois Beaulieu, Paul Vozila

TL;DR
MedRiskEval introduces a comprehensive benchmark for evaluating medical language models, emphasizing user perspectives and patient safety to promote safer deployment in healthcare.
Contribution
The paper presents MedRiskEval, a novel risk evaluation benchmark including a patient-oriented dataset, addressing safety concerns for diverse healthcare user groups.
Findings
Evaluated multiple LLMs on the new benchmark
Identified safety risks across different user perspectives
Provided insights for safer medical AI deployment
Abstract
As the performance of large language models (LLMs) continues to advance, their adoption in the medical domain is increasing. However, most existing risk evaluations largely focused on general safety benchmarks. In the medical applications, LLMs may be used by a wide range of users, ranging from general users and patients to clinicians, with diverse levels of expertise and the model's outputs can have a direct impact on human health which raises serious safety concerns. In this paper, we introduce MedRiskEval, a medical risk evaluation benchmark tailored to the medical domain. To fill the gap in previous benchmarks that only focused on the clinician perspective, we introduce a new patient-oriented dataset called PatientSafetyBench containing 466 samples across 5 critical risk categories. Leveraging our new benchmark alongside existing datasets, we evaluate a variety of open- and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗microsoft/MediPhimodel· 4.2k dl· ♡ 194.2k dl♡ 19
- 🤗microsoft/MediPhi-PubMedmodel· 155 dl· ♡ 9155 dl♡ 9
- 🤗microsoft/MediPhi-MedWikimodel· 35 dl· ♡ 335 dl♡ 3
- 🤗microsoft/MediPhi-Instructmodel· 4.8k dl· ♡ 614.8k dl♡ 61
- 🤗microsoft/MediPhi-MedCodemodel· 74 dl· ♡ 674 dl♡ 6
- 🤗microsoft/MediPhi-Clinicalmodel· 418 dl· ♡ 12418 dl♡ 12
- 🤗microsoft/MediPhi-Guidelinesmodel· 34 dl· ♡ 434 dl♡ 4
- 🤗gabriellarson/MediPhi-Instruct-GGUFmodel· 34 dl· ♡ 234 dl♡ 2
- 🤗Mungert/MediPhi-Instruct-GGUFmodel· 250 dl250 dl
- 🤗prathamesh-chavan/MediPhi-MedCode-bnb-4bitmodel
Videos
