Knowledge-Embedded Latent Projection for Robust Representation Learning

Weijing Tang; Ming Yuan; Zongqi Xia; Tianxi Cai

arXiv:2602.16709·cs.LG·February 19, 2026

Knowledge-Embedded Latent Projection for Robust Representation Learning

Weijing Tang, Ming Yuan, Zongqi Xia, Tianxi Cai

PDF

Open Access

TL;DR

This paper introduces a knowledge-embedded latent projection model that leverages semantic embeddings to improve representation learning in high-dimensional, imbalanced EHR data matrices, with theoretical guarantees and practical validation.

Contribution

It proposes a novel method combining semantic embeddings with kernel PCA and scalable optimization, providing estimation error bounds and convergence guarantees.

Findings

01

Effective in handling imbalanced high-dimensional data

02

Improves representation quality with semantic side information

03

Validated through simulations and real EHR data

Abstract

Latent space models are widely used for analyzing high-dimensional discrete data matrices, such as patient-feature matrices in electronic health records (EHRs), by capturing complex dependence structures through low-dimensional embeddings. However, estimation becomes challenging in the imbalanced regime, where one matrix dimension is much larger than the other. In EHR applications, cohort sizes are often limited by disease prevalence or data availability, whereas the feature space remains extremely large due to the breadth of medical coding system. Motivated by the increasing availability of external semantic embeddings, such as pre-trained embeddings of clinical concepts in EHRs, we propose a knowledge-embedded latent projection model that leverages semantic side information to regularize representation learning. Specifically, we model column embeddings as smooth functions of semantic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare · Imbalanced Data Classification Techniques · Topic Modeling