What Disease does this Patient Have? A Large-scale Open Domain Question Answering Dataset from Medical Exams
Di Jin, Eileen Pan, Nassim Oufattole, Wei-Hung Weng, Hanyi Fang and, Peter Szolovits

TL;DR
This paper introduces MedQA, a large-scale, multilingual open-domain medical question answering dataset from professional exams, highlighting the current performance limitations of existing models and encouraging future advancements.
Contribution
It provides the first free-form multiple-choice medical OpenQA dataset from professional exams in three languages, facilitating research and development of more advanced models.
Findings
Current best models achieve only 36.7% to 70.1% accuracy.
MedQA covers three languages: English, simplified Chinese, traditional Chinese.
The dataset presents significant challenges for existing OpenQA systems.
Abstract
Open domain question answering (OpenQA) tasks have been recently attracting more and more attention from the natural language processing (NLP) community. In this work, we present the first free-form multiple-choice OpenQA dataset for solving medical problems, MedQA, collected from the professional medical board exams. It covers three languages: English, simplified Chinese, and traditional Chinese, and contains 12,723, 34,251, and 14,123 questions for the three languages, respectively. We implement both rule-based and popular neural methods by sequentially combining a document retriever and a machine comprehension model. Through experiments, we find that even the current best method can only achieve 36.7\%, 42.0\%, and 70.1\% of test accuracy on the English, traditional Chinese, and simplified Chinese questions, respectively. We expect MedQA to present great challenges to existing OpenQA…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗google/medgemma-1.5-4b-itmodel· 86k dl· ♡ 53686k dl♡ 536
- 🤗google/medgemma-4b-itmodel· 170k dl· ♡ 925170k dl♡ 925
- 🤗google/medsiglip-448model· 22k dl· ♡ 12922k dl♡ 129
- 🤗unsloth/medgemma-27b-it-GGUFmodel· 4.4k dl· ♡ 384.4k dl♡ 38
- 🤗google/medgemma-4b-ptmodel· 1.1k dl· ♡ 1481.1k dl♡ 148
- 🤗google/medgemma-27b-text-itmodel· 37k dl· ♡ 41237k dl♡ 412
- 🤗google/medgemma-27b-itmodel· 107k dl· ♡ 330107k dl♡ 330
- 🤗pszemraj/medgemma-4b-it-hereticmodel· 46 dl· ♡ 546 dl♡ 5
- 🤗pszemraj/medgemma-27b-text-heretic_medmodel· 11 dl· ♡ 511 dl♡ 5
- 🤗unsloth/medgemma-1.5-4b-it-GGUFmodel· 6.7k dl· ♡ 336.7k dl♡ 33
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Sentiment Analysis and Opinion Mining
