MediTOD: An English Dialogue Dataset for Medical History Taking with   Comprehensive Annotations

Vishal Vivek Saley; Goonjan Saha; Rocktim Jyoti Das; Dinesh Raghu,; Mausam

arXiv:2410.14204·cs.CL·October 21, 2024

MediTOD: An English Dialogue Dataset for Medical History Taking with Comprehensive Annotations

Vishal Vivek Saley, Goonjan Saha, Rocktim Jyoti Das, Dinesh Raghu,, Mausam

PDF

Open Access 1 Video

TL;DR

MediTOD is a comprehensive English medical dialogue dataset with detailed annotations, designed to improve medical history-taking systems and facilitate research in dialogue understanding, policy learning, and generation.

Contribution

The paper introduces MediTOD, a new annotated dataset for medical dialogues, addressing privacy issues and lack of detailed annotations in existing datasets.

Findings

01

Established benchmarks for NLU, policy, and NLG tasks.

02

Demonstrated effectiveness of models on MediTOD in supervised and few-shot settings.

03

Provided publicly available dataset for future research.

Abstract

Medical task-oriented dialogue systems can assist doctors by collecting patient medical history, aiding in diagnosis, or guiding treatment selection, thereby reducing doctor burnout and expanding access to medical services. However, doctor-patient dialogue datasets are not readily available, primarily due to privacy regulations. Moreover, existing datasets lack comprehensive annotations involving medical slots and their different attributes, such as symptoms and their onset, progression, and severity. These comprehensive annotations are crucial for accurate diagnosis. Finally, most existing datasets are non-English, limiting their utility for the larger research community. In response, we introduce MediTOD, a new dataset of doctor-patient dialogues in English for the medical history-taking task. Collaborating with doctors, we devise a questionnaire-based labeling scheme tailored to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

MediTOD: An English Dialogue Dataset for Medical History Taking with Comprehensive Annotations· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Biomedical Text Mining and Ontologies