Spherical Mixture Integration for Latent Embedding Alignment across Multi-Source Feature Spaces
Yuming Zhang, Congyuan Duan, Dong Xia, Doudou Zhou, Tianxi Cai

TL;DR
This paper introduces SMILE, a method for aligning and harmonizing multi-source clinical data embeddings to improve multi-institutional EHR analysis.
Contribution
It proposes a novel spherical mixture model with weak supervision for embedding alignment and provides theoretical guarantees for its effectiveness.
Findings
Enhanced alignment of heterogeneous EHR data sources.
Theoretical error bounds for latent embedding recovery.
Improved synonym clustering demonstrated in simulations and real data.
Abstract
Multi-institutional electronic health record (Multi-EHR) data have emerged as a powerful resource for developing predictive models to support clinical decisions and for generating reliable real-world evidence. By aggregating information from diverse patient populations and institutions, they enhance the robustness and generalizability of models and findings. However, analyzing multi-EHR remains challenging because disparate institutions rarely map all data elements to common ontology, and raw EHR codes are often overly granular and institution-specific, fragmenting representations of the same clinical concept. Hence, integrative analysis must overcome two key hurdles: harmonizing codes with the same clinical meaning (synonymy), and aligning institutional feature spaces. To address these challenges, we propose SMILE, a Spherical Mixture Integration for Latent Embedding alignment across…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
