CDE-Mapper: Using Retrieval-Augmented Language Models for Linking Clinical Data Elements to Controlled Vocabularies
Komal Gilani, Marlo Verket, Christof Peters, Michel Dumontier,, Hans-Peter Brunner-La Rocca, Visara Urovi

TL;DR
CDE-Mapper is a novel framework that uses retrieval-augmented language models to accurately link clinical data elements to controlled vocabularies, enhancing data standardization and interoperability in healthcare.
Contribution
It introduces a retrieval-augmented generation approach with query decomposition, expert rule integration, and human-in-the-loop validation for improved clinical data element linking.
Findings
Achieved 7.2% higher accuracy than baseline methods.
Validated effectiveness across four diverse datasets.
Reduced computational costs with knowledge reservoir.
Abstract
The standardization of clinical data elements (CDEs) aims to ensure consistent and comprehensive patient information across various healthcare systems. Existing methods often falter when standardizing CDEs of varying representation and complex structure, impeding data integration and interoperability in clinical research. We introduce CDE-Mapper, an innovative framework that leverages Retrieval-Augmented Generation approach combined with Large Language Models to automate the linking of CDEs to controlled vocabularies. Our modular approach features query decomposition to manage varying levels of CDEs complexity, integrates expert-defined rules within prompt engineering, and employs in-context learning alongside multiple retriever components to resolve terminological ambiguities. In addition, we propose a knowledge reservoir validated by a human-in-loop approach, achieving accurate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Machine Learning in Healthcare · Topic Modeling
