Nonlinear multi-study factor analysis
Gemma E. Moran, Anandi Krishnan

TL;DR
This paper introduces a nonlinear multi-study factor model using a sparse variational autoencoder to identify shared and study-specific factors in high-dimensional data, demonstrated on gene expression data.
Contribution
It proposes a novel nonlinear multi-study factor model with a sparse variational autoencoder that effectively separates shared and specific factors across studies.
Findings
Successfully identified meaningful shared and specific factors in gene expression data.
Proved the model's ability to identify latent factors uniquely.
Demonstrated the model's effectiveness on real-world genomics data.
Abstract
High-dimensional data often exhibit variation that can be captured by lower dimensional factors. For high-dimensional data from multiple studies or environments, one goal is to understand which underlying factors are common to all studies, and which factors are study or environment-specific. As a particular example, we consider platelet gene expression data from patients in different disease groups. In this data, factors correspond to clusters of genes which are co-expressed; we may expect some clusters (or biological pathways) to be active for all diseases, while some clusters are only active for a specific disease. To learn these factors, we consider a nonlinear multi-study factor model, which allows for both shared and specific factors. To fit this model, we propose a multi-study sparse variational autoencoder. The underlying model is sparse in that each observed feature (i.e. each…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Single-cell and spatial transcriptomics · Statistical Methods and Inference
