CURE: A Dataset for Clinical Understanding & Retrieval Evaluation
Nadia Athar Sheikh, Daniel Buades Marcos, Anne-Laure Jousse, Akintunde Oladipo, Olivier Rousseau, Jimmy Lin

TL;DR
CURE is a new, domain-specific dataset designed for evaluating clinical passage retrieval systems, including monolingual and cross-lingual conditions, to improve healthcare information retrieval accuracy.
Contribution
The paper introduces CURE, a novel dataset for clinical retrieval evaluation with 2000 queries across multiple medical domains, addressing a gap in domain-specific test resources.
Findings
Baseline results demonstrate CURE's effectiveness for retrieval evaluation.
CURE supports monolingual and cross-lingual retrieval tasks.
The dataset is publicly accessible for research use.
Abstract
Given the dominance of dense retrievers that do not generalize well beyond their training dataset distributions, domain-specific test sets are essential in evaluating retrieval. There are few test datasets for retrieval systems intended for use by healthcare providers in a point-of-care setting. To fill this gap we have collaborated with medical professionals to create CURE, an ad-hoc retrieval test dataset for passage ranking with 2000 queries spanning 10 medical domains with a monolingual (English) and two cross-lingual (French/Spanish -> English) conditions. In this paper, we describe how CURE was constructed and provide baseline results to showcase its effectiveness as an evaluation tool. CURE is published with a Creative Commons Attribution Non Commercial 4.0 license and can be accessed on Hugging Face and as a retrieval task on MTEB.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation Retrieval and Search Behavior · Health Literacy and Information Accessibility · Topic Modeling
MethodsNetwork On Network
