Knowledge-Embedded Latent Projection for Robust Representation Learning
Weijing Tang, Ming Yuan, Zongqi Xia, Tianxi Cai

TL;DR
This paper introduces a knowledge-embedded latent projection model that leverages semantic embeddings to improve representation learning in high-dimensional, imbalanced EHR data matrices, with theoretical guarantees and practical validation.
Contribution
It proposes a novel method combining semantic embeddings with kernel PCA and scalable optimization, providing estimation error bounds and convergence guarantees.
Findings
Effective in handling imbalanced high-dimensional data
Improves representation quality with semantic side information
Validated through simulations and real EHR data
Abstract
Latent space models are widely used for analyzing high-dimensional discrete data matrices, such as patient-feature matrices in electronic health records (EHRs), by capturing complex dependence structures through low-dimensional embeddings. However, estimation becomes challenging in the imbalanced regime, where one matrix dimension is much larger than the other. In EHR applications, cohort sizes are often limited by disease prevalence or data availability, whereas the feature space remains extremely large due to the breadth of medical coding system. Motivated by the increasing availability of external semantic embeddings, such as pre-trained embeddings of clinical concepts in EHRs, we propose a knowledge-embedded latent projection model that leverages semantic side information to regularize representation learning. Specifically, we model column embeddings as smooth functions of semantic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Imbalanced Data Classification Techniques · Topic Modeling
