Nonlinear multi-study factor analysis

Gemma E. Moran; Anandi Krishnan

arXiv:2601.18128·stat.ML·January 27, 2026

Nonlinear multi-study factor analysis

Gemma E. Moran, Anandi Krishnan

PDF

Open Access

TL;DR

This paper introduces a nonlinear multi-study factor model using a sparse variational autoencoder to identify shared and study-specific factors in high-dimensional data, demonstrated on gene expression data.

Contribution

It proposes a novel nonlinear multi-study factor model with a sparse variational autoencoder that effectively separates shared and specific factors across studies.

Findings

01

Successfully identified meaningful shared and specific factors in gene expression data.

02

Proved the model's ability to identify latent factors uniquely.

03

Demonstrated the model's effectiveness on real-world genomics data.

Abstract

High-dimensional data often exhibit variation that can be captured by lower dimensional factors. For high-dimensional data from multiple studies or environments, one goal is to understand which underlying factors are common to all studies, and which factors are study or environment-specific. As a particular example, we consider platelet gene expression data from patients in different disease groups. In this data, factors correspond to clusters of genes which are co-expressed; we may expect some clusters (or biological pathways) to be active for all diseases, while some clusters are only active for a specific disease. To learn these factors, we consider a nonlinear multi-study factor model, which allows for both shared and specific factors. To fit this model, we propose a multi-study sparse variational autoencoder. The underlying model is sparse in that each observed feature (i.e. each…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGene expression and cancer classification · Single-cell and spatial transcriptomics · Statistical Methods and Inference