A Dataset for Addressing Patient's Information Needs related to Clinical Course of Hospitalization
Sarvesh Soni, Dina Demner-Fushman

TL;DR
This paper introduces ArchEHR-QA, a new expert-annotated dataset of patient questions and clinical notes from ICU and emergency settings, to evaluate AI models' ability to generate accurate, relevant answers grounded in EHR data.
Contribution
The creation of ArchEHR-QA, the first dataset capturing patient information needs with clinical note annotations, and benchmarking LLMs for grounded EHR question answering.
Findings
Answer-first prompting yields best performance.
Llama 4 outperforms other models in factuality and relevance.
Common errors include omitted evidence and hallucinated content.
Abstract
Patients have distinct information needs about their hospitalization that can be addressed using clinical evidence from electronic health records (EHRs). While artificial intelligence (AI) systems show promise in meeting these needs, robust datasets are needed to evaluate the factual accuracy and relevance of AI-generated responses. To our knowledge, no existing dataset captures patient information needs in the context of their EHRs. We introduce ArchEHR-QA, an expert-annotated dataset based on real-world patient cases from intensive care unit and emergency department settings. The cases comprise questions posed by patients to public health forums, clinician-interpreted counterparts, relevant clinical note excerpts with sentence-level relevance annotations, and clinician-authored answers. To establish benchmarks for grounded EHR question answering (QA), we evaluated three open-weight…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
