Linking Electronic Health Records for Multiple Sclerosis Research: Comparative Study of Deterministic, Probabilistic, and Machine Learning Linkage Methods
Ohoud Almadani, Yasser Albogami, Adel Alrwisan

TL;DR
This study compares different data linkage methods for connecting electronic health records of multiple sclerosis patients in Saudi Arabia, finding probabilistic methods to be the most effective and efficient.
Contribution
The paper introduces a comparative evaluation of deterministic, probabilistic, and machine learning linkage methods in a real-world MS research context.
Findings
Probabilistic linkage outperformed deterministic and machine learning methods in balancing recall and precision.
Machine learning achieved the highest F1 score (99.8%) but was computationally expensive.
Deterministic linkage was fast but had lower match rates compared to probabilistic methods.
Abstract
Data linkage in pharmacoepidemiological research is commonly employed to ascertain exposures and outcomes or to obtain additional information on confounding variables. However, to protect patient confidentiality, unique patient identifiers are not provided, which makes data linkage across multiple sources challenging. The Saudi Real-World Evidence Network (SRWEN) aggregates electronic health records from various hospitals, which may require robust linkage techniques. We aimed to evaluate and compare the performance of deterministic, probabilistic, and machine learning (ML) approaches for linking deidentified data of patients with multiple sclerosis (MS) from the SRWEN and Ministry of National Guard Health Affairs electronic health record systems. A simulation-based validation framework was applied before linking real-world data sources. Deterministic linkage was based on predefined…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Pharmacovigilance and Adverse Drug Reactions · Data Quality and Management
