Representation Learning with Blockwise Missingness and Signal Heterogeneity
Ziqi Liu, Ye Tian, Weijing Tang

TL;DR
This paper introduces APPCA, a robust representation learning method for multi-source data with blockwise missingness and signal heterogeneity, improving embedding accuracy in complex real-world scenarios.
Contribution
We develop APPCA, a novel framework that effectively handles structured blockwise missingness and heterogeneity, with theoretical guarantees and practical validation.
Findings
APPCA outperforms existing methods in simulations.
APPCA achieves accurate embeddings in single-cell sequencing data.
Theoretical bounds show robustness to signal heterogeneity.
Abstract
Unified representation learning for multi-source data integration faces two important challenges: blockwise missingness and blockwise signal heterogeneity. The former arises from sources observing different, yet potentially overlapping, feature sets, while the latter involves varying signal strengths across subject groups and feature sets. While existing methods perform well with fully observed data or uniform signal strength, their performance degenerates when these two challenges coincide, which is common in practice. To address this, we propose Anchor Projected Principal Component Analysis (APPCA), a general framework for representation learning with structured blockwise missingness that is robust to signal heterogeneity. APPCA first recovers robust group-specific column spaces using all observed feature sets, and then aligns them by projecting shared "anchor" features onto these…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSingle-cell and spatial transcriptomics · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis
