Semantic NLP Pipelines for Interoperable Patient Digital Twins from Unstructured EHRs
Rafael Brens, Yuqiao Meng, Luoxi Tang, Zhaohan Xi

TL;DR
This paper introduces a semantic NLP pipeline that converts unstructured EHR notes into interoperable, FHIR-compliant digital twin representations, enhancing healthcare data integration and clinical decision support.
Contribution
It presents a novel NLP-driven approach combining NER, concept normalization, and relation extraction to generate standardized digital twins from free-text EHRs.
Findings
High F1-scores for entity and relation extraction
Improved schema completeness and interoperability
Validated on MIMIC-IV dataset with reference mappings
Abstract
Digital twins -- virtual replicas of physical entities -- are gaining traction in healthcare for personalized monitoring, predictive modeling, and clinical decision support. However, generating interoperable patient digital twins from unstructured electronic health records (EHRs) remains challenging due to variability in clinical documentation and lack of standardized mappings. This paper presents a semantic NLP-driven pipeline that transforms free-text EHR notes into FHIR-compliant digital twin representations. The pipeline leverages named entity recognition (NER) to extract clinical concepts, concept normalization to map entities to SNOMED-CT or ICD-10, and relation extraction to capture structured associations between conditions, medications, and observations. Evaluation on MIMIC-IV Clinical Database Demo with validation against MIMIC-IV-on-FHIR reference mappings demonstrates high…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Artificial Intelligence in Healthcare and Education · Electronic Health Records Systems
