TL;DR
This paper introduces a large, EHR-based benchmark dataset for biomedical concept relatedness, enabling more effective AI retrieval of relevant medical concepts from electronic health records.
Contribution
It provides the first large-scale, EHR-derived benchmark dataset for biomedical concept relatedness, addressing limitations of previous small, hand-picked datasets.
Findings
The dataset is six times larger than existing ones.
It includes diverse concept types from EHRs.
State-of-the-art models find it a challenging benchmark.
Abstract
A promising application of AI to healthcare is the retrieval of information from electronic health records (EHRs), e.g. to aid clinicians in finding relevant information for a consultation or to recruit suitable patients for a study. This requires search capabilities far beyond simple string matching, including the retrieval of concepts (diagnoses, symptoms, medications, etc.) related to the one in question. The suitability of AI methods for such applications is tested by predicting the relatedness of concepts with known relatedness scores. However, all existing biomedical concept relatedness datasets are notoriously small and consist of hand-picked concept pairs. We open-source a novel concept relatedness benchmark overcoming these issues: it is six times larger than existing datasets and concept pairs are chosen based on co-occurrence in EHRs, ensuring their relevance for the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
