K-QA: A Real-World Medical Q&A Benchmark

Itay Manes; Naama Ronn; David Cohen; Ran Ilan Ber; Zehavi; Horowitz-Kugler; Gabriel Stanovsky

arXiv:2401.14493·cs.CL·January 29, 2024·1 cites

K-QA: A Real-World Medical Q&A Benchmark

Itay Manes, Naama Ronn, David Cohen, Ran Ilan Ber, Zehavi, Horowitz-Kugler, Gabriel Stanovsky

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This paper introduces K-QA, a real-world medical question-answering dataset, and evaluates large language models' accuracy using novel metrics, highlighting improvements from in-context learning and retrieval augmentation.

Contribution

The paper presents K-QA, a new real-world medical Q&A dataset with evaluation metrics, and assesses model performance, demonstrating benefits of in-context learning and retrieval methods.

Findings

01

In-context learning enhances answer comprehensiveness.

02

Augmented retrieval reduces hallucinations.

03

K-QA dataset is publicly available for research.

Abstract

Ensuring the accuracy of responses provided by large language models (LLMs) is crucial, particularly in clinical settings where incorrect information may directly impact patient health. To address this challenge, we construct K-QA, a dataset containing 1,212 patient questions originating from real-world conversations held on K Health (an AI-driven clinical platform). We employ a panel of in-house physicians to answer and manually decompose a subset of K-QA into self-contained statements. Additionally, we formulate two NLI-based evaluation metrics approximating recall and precision: (1) comprehensiveness, measuring the percentage of essential clinical information in the generated answer and (2) hallucination rate, measuring the number of statements from the physician-curated response contradicted by the LLM answer. Finally, we use K-QA along with these metrics to evaluate several…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

itaymanes/k-qa
noneOfficial

Datasets

Itaykhealth/K-QA
dataset· 38 dl
38 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Machine Learning in Healthcare · Text Readability and Simplification