Multi-context principal component analysis
Kexin Wang, Salil Bhate, Jo\~ao M. Pereira, Joe Kileel, Matylda Figlerowicz, Anna Seigal

TL;DR
MCPCA is a new framework extending PCA to identify shared and context-specific factors in multi-context data, demonstrated on gene expression and language models.
Contribution
We introduce MCPCA, a novel theoretical and algorithmic approach for decomposing data into shared and unique factors across multiple contexts.
Findings
Reveals shared axes of variation across cancer types in gene expression data.
Identifies axes associated with lung cancer progression.
Maps stages of debate in language models over decades.
Abstract
Principal component analysis (PCA) is a tool to capture factors that explain variation in data. Across domains, data are now collected across multiple contexts (for example, individuals with different diseases, cells of different types, or words across texts). While the factors explaining variation in data are undoubtedly shared across subsets of contexts, no tools currently exist to systematically recover such factors. We develop multi-context principal component analysis (MCPCA), a theoretical and algorithmic framework that decomposes data into factors shared across subsets of contexts. Applied to gene expression, MCPCA reveals axes of variation shared across subsets of cancer types and an axis whose variability in tumor cells, but not mean, is associated with lung cancer progression. Applied to contextualized word embeddings from language models, MCPCA maps stages of a debate on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Gene expression and cancer classification · Bioinformatics and Genomic Networks
