DrugEHRQA: A Question Answering Dataset on Structured and Unstructured   Electronic Health Records For Medicine Related Queries

Jayetri Bardhan; Anthony Colas; Kirk Roberts; Daisy Zhe Wang

arXiv:2205.01290·cs.AI·May 4, 2022

DrugEHRQA: A Question Answering Dataset on Structured and Unstructured Electronic Health Records For Medicine Related Queries

Jayetri Bardhan, Anthony Colas, Kirk Roberts, Daisy Zhe Wang

PDF

1 Repo

TL;DR

DrugEHRQA is a new dataset with over 70,000 medication-related question-answer pairs from structured tables and unstructured clinical notes, designed to advance multi-modal question answering in electronic health records.

Contribution

The paper introduces the first comprehensive EHR question answering dataset combining structured and unstructured data, and proposes baseline models including a modality selection network and the use of RAT-SQL for complex queries.

Findings

01

Dataset contains over 70,000 QA pairs.

02

Baseline model uses modality selection for answer routing.

03

First application of RAT-SQL to EHR data.

Abstract

This paper develops the first question answering dataset (DrugEHRQA) containing question-answer pairs from both structured tables and unstructured notes from a publicly available Electronic Health Record (EHR). EHRs contain patient records, stored in structured tables and unstructured clinical notes. The information in structured and unstructured EHRs is not strictly disjoint: information may be duplicated, contradictory, or provide additional context between these sources. Our dataset has medication-related queries, containing over 70,000 question-answer pairs. To provide a baseline model and help analyze the dataset, we have used a simple model (MultimodalEHRQA) which uses the predictions of a modality selection network to choose between EHR tables and clinical notes to answer the questions. This is used to direct the questions to the table-based or text-based state-of-the-art QA…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jayetri/DrugEHRQA-A-Question-Answering-Dataset-on-Structured-and-Unstructured-Electronic-Health-Records
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.