MeDiaQA: A Question Answering Dataset on Medical Dialogues

Huqun Suri; Qi Zhang; Wenhua Huo; Yan Liu; Chunsheng Guan

arXiv:2108.08074·cs.CL·August 19, 2021·1 cites

MeDiaQA: A Question Answering Dataset on Medical Dialogues

Huqun Suri, Qi Zhang, Wenhua Huo, Yan Liu, Chunsheng Guan

PDF

Open Access

TL;DR

MeDiaQA is a new large-scale medical dialogue question answering dataset designed to evaluate reasoning and understanding in multi-turn medical conversations, with a baseline model showing significant room for improvement.

Contribution

The paper introduces MeDiaQA, the first dataset for reasoning over medical dialogues, and proposes MeDia-BERT, a baseline model for this challenging task.

Findings

01

MeDiaQA contains 22k questions from 11k dialogues across 150 specialties.

02

MeDia-BERT achieves 64.3% accuracy, below human performance of 93%.

03

The dataset enables testing of reasoning and understanding in medical dialogue QA.

Abstract

In this paper, we introduce MeDiaQA, a novel question answering(QA) dataset, which constructed on real online Medical Dialogues. It contains 22k multiple-choice questions annotated by human for over 11k dialogues with 120k utterances between patients and doctors, covering 150 specialties of diseases, which are collected from haodf.com and dxy.com. MeDiaQA is the first QA dataset where reasoning over medical dialogues, especially their quantitative contents. The dataset has the potential to test the computing, reasoning and understanding ability of models across multi-turn dialogues, which is challenging compared with the existing datasets. To address the challenges, we design MeDia-BERT, and it achieves 64.3% accuracy, while human performance of 93% accuracy, which indicates that there still remains a large room for improvement.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems