S4CMDR: a metadata repository for electronic health records
Jiawei Zhao, Md Shamim Ahmed, Nicolai Dinh Khang Truong, Verena Schuster, Rudolf Mayer, Richard R\"ottger

TL;DR
S4CMDR is an open-source, standards-based metadata repository designed to catalog and discover compatible electronic health record data elements, facilitating large-scale, cross-clinical machine learning research.
Contribution
It introduces a novel microservice architecture and a middle-out standardisation approach for clinical metadata management, enhancing data compatibility and usability.
Findings
Supports on-premise and cloud deployment
Enables discovery of compatible EHR data sets
Validated through case studies on rare disease data
Abstract
Background: Electronic health records (EHRs) enable machine learning for diagnosis, prognosis, and clinical decision support. However, EHR standards vary by country and hospital, making records often incompatible. This limits large-scale and cross-clinical machine learning. To address such complexity, a metadata repository cataloguing available data elements, their value domains, and their compatibility is an essential tool. This allows researchers to leverage relevant data for tasks such as identifying undiagnosed rare disease patients. Results: Within the Screen4Care project, we developed S4CMDR, an open-source metadata repository built on ISO 11179-3, based on a middle-out metadata standardisation approach. It automates cataloguing to reduce errors and enable the discovery of compatible feature sets across data registries. S4CMDR supports on-premise Linux deployment and cloud…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Rare Diseases · Electronic Health Records Systems · Machine Learning in Healthcare
