MediQAl: A French Medical Question Answering Dataset for Knowledge and Reasoning Evaluation
Adrien Bazoge

TL;DR
MediQAl is a comprehensive French medical question answering dataset designed to evaluate language models' factual recall and reasoning abilities across clinical scenarios, with extensive benchmarking of 14 models.
Contribution
The paper introduces MediQAl, a large French medical QA dataset with diverse question types and detailed annotations, filling a gap in multilingual medical NLP resources.
Findings
Large performance gap between factual recall and reasoning tasks.
Extensive evaluation of 14 language models on medical questions.
Dataset enables detailed analysis of models' understanding and reasoning.
Abstract
This work introduces MediQAl, a French medical question answering dataset designed to evaluate the capabilities of language models in factual medical recall and reasoning over real-world clinical scenarios. MediQAl contains 32,603 questions sourced from French medical examinations across 41 medical subjects. The dataset includes three tasks: (i) Multiple-Choice Question with Unique answer, (ii) Multiple-Choice Question with Multiple answer, and (iii) Open-Ended Question with Short-Answer. Each question is labeled as Understanding or Reasoning, enabling a detailed analysis of models' cognitive capabilities. We validate the MediQAl dataset through extensive evaluation with 14 large language models, including recent reasoning-augmented models, and observe a significant performance gap between factual recall and reasoning tasks. Our evaluation provides a comprehensive benchmark for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
