Representation Learning with Blockwise Missingness and Signal Heterogeneity

Ziqi Liu; Ye Tian; Weijing Tang

arXiv:2602.11511·stat.ME·February 13, 2026

Representation Learning with Blockwise Missingness and Signal Heterogeneity

Ziqi Liu, Ye Tian, Weijing Tang

PDF

Open Access

TL;DR

This paper introduces APPCA, a robust representation learning method for multi-source data with blockwise missingness and signal heterogeneity, improving embedding accuracy in complex real-world scenarios.

Contribution

We develop APPCA, a novel framework that effectively handles structured blockwise missingness and heterogeneity, with theoretical guarantees and practical validation.

Findings

01

APPCA outperforms existing methods in simulations.

02

APPCA achieves accurate embeddings in single-cell sequencing data.

03

Theoretical bounds show robustness to signal heterogeneity.

Abstract

Unified representation learning for multi-source data integration faces two important challenges: blockwise missingness and blockwise signal heterogeneity. The former arises from sources observing different, yet potentially overlapping, feature sets, while the latter involves varying signal strengths across subject groups and feature sets. While existing methods perform well with fully observed data or uniform signal strength, their performance degenerates when these two challenges coincide, which is common in practice. To address this, we propose Anchor Projected Principal Component Analysis (APPCA), a general framework for representation learning with structured blockwise missingness that is robust to signal heterogeneity. APPCA first recovers robust group-specific column spaces using all observed feature sets, and then aligns them by projecting shared "anchor" features onto these…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSingle-cell and spatial transcriptomics · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis