Multi-context principal component analysis

Kexin Wang; Salil Bhate; Jo\~ao M. Pereira; Joe Kileel; Matylda Figlerowicz; Anna Seigal

arXiv:2601.15239·stat.ML·January 22, 2026

Multi-context principal component analysis

Kexin Wang, Salil Bhate, Jo\~ao M. Pereira, Joe Kileel, Matylda Figlerowicz, Anna Seigal

PDF

Open Access

TL;DR

MCPCA is a new framework extending PCA to identify shared and context-specific factors in multi-context data, demonstrated on gene expression and language models.

Contribution

We introduce MCPCA, a novel theoretical and algorithmic approach for decomposing data into shared and unique factors across multiple contexts.

Findings

01

Reveals shared axes of variation across cancer types in gene expression data.

02

Identifies axes associated with lung cancer progression.

03

Maps stages of debate in language models over decades.

Abstract

Principal component analysis (PCA) is a tool to capture factors that explain variation in data. Across domains, data are now collected across multiple contexts (for example, individuals with different diseases, cells of different types, or words across texts). While the factors explaining variation in data are undoubtedly shared across subsets of contexts, no tools currently exist to systematically recover such factors. We develop multi-context principal component analysis (MCPCA), a theoretical and algorithmic framework that decomposes data into factors shared across subsets of contexts. Applied to gene expression, MCPCA reveals axes of variation shared across subsets of cancer types and an axis whose variability in tumor cells, but not mean, is associated with lung cancer progression. Applied to contextualized word embeddings from language models, MCPCA maps stages of a debate on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBiomedical Text Mining and Ontologies · Gene expression and cancer classification · Bioinformatics and Genomic Networks