MediTOD: An English Dialogue Dataset for Medical History Taking with Comprehensive Annotations
Vishal Vivek Saley, Goonjan Saha, Rocktim Jyoti Das, Dinesh Raghu,, Mausam

TL;DR
MediTOD is a comprehensive English medical dialogue dataset with detailed annotations, designed to improve medical history-taking systems and facilitate research in dialogue understanding, policy learning, and generation.
Contribution
The paper introduces MediTOD, a new annotated dataset for medical dialogues, addressing privacy issues and lack of detailed annotations in existing datasets.
Findings
Established benchmarks for NLU, policy, and NLG tasks.
Demonstrated effectiveness of models on MediTOD in supervised and few-shot settings.
Provided publicly available dataset for future research.
Abstract
Medical task-oriented dialogue systems can assist doctors by collecting patient medical history, aiding in diagnosis, or guiding treatment selection, thereby reducing doctor burnout and expanding access to medical services. However, doctor-patient dialogue datasets are not readily available, primarily due to privacy regulations. Moreover, existing datasets lack comprehensive annotations involving medical slots and their different attributes, such as symptoms and their onset, progression, and severity. These comprehensive annotations are crucial for accurate diagnosis. Finally, most existing datasets are non-English, limiting their utility for the larger research community. In response, we introduce MediTOD, a new dataset of doctor-patient dialogues in English for the medical history-taking task. Collaborating with doctors, we devise a questionnaire-based labeling scheme tailored to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Biomedical Text Mining and Ontologies
