Development of an Extractive Clinical Question Answering Dataset with Multi-Answer and Multi-Focus Questions
Sungrim Moon, Huan He, Hongfang Liu, Jungwei W. Fan

TL;DR
This paper introduces RxWhyQA, a large clinical question-answering dataset with multi-answer and multi-focus questions, to advance NLP systems in handling complex, realistic clinical inquiries.
Contribution
The creation of RxWhyQA dataset, incorporating complex multi-answer and multi-focus questions based on clinical relations, filling a gap in existing datasets for clinical EQA.
Findings
Baseline model achieved 0.72 F1 on the dataset.
25% of questions require multiple answers.
90% of relevant terms occur within adjacent sentences.
Abstract
Background: Extractive question-answering (EQA) is a useful natural language processing (NLP) application for answering patient-specific questions by locating answers in their clinical notes. Realistic clinical EQA can have multiple answers to a single question and multiple focus points in one question, which are lacking in the existing datasets for development of artificial intelligence solutions. Objective: Create a dataset for developing and evaluating clinical EQA systems that can handle natural multi-answer and multi-focus questions. Methods: We leveraged the annotated relations from the 2018 National NLP Clinical Challenges (n2c2) corpus to generate an EQA dataset. Specifically, the 1-to-N, M-to-1, and M-to-N drug-reason relations were included to form the multi-answer and multi-focus QA entries, which represent more complex and natural challenges in addition to the basic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Biomedical Text Mining and Ontologies
