A latent factor model with a mixture of sparse and dense factors to model gene expression data with confounding effects
Chuan Gao, Christopher D Brown, Barbara E Engelhardt

TL;DR
This paper introduces a Bayesian latent factor model with a mixture of sparse and dense factors to effectively analyze gene expression data, capturing both confounding effects and local gene interactions.
Contribution
It proposes a novel Bayesian model with layered shrinkage and a mixture component to distinguish between sparse and dense factors, automatically determining the number of factors.
Findings
Successfully recovered true latent structures in simulated data
Identified known covariates and gene groups in real data
Discovered biologically relevant genetic regulators
Abstract
One important problem in genome science is to determine sets of co-regulated genes based on measurements of gene expression levels across samples, where the quantification of expression levels includes substantial technical and biological noise. To address this problem, we developed a Bayesian sparse latent factor model that uses a three parameter beta prior to flexibly model shrinkage in the loading matrix. By applying three layers of shrinkage to the loading matrix (global, factor-specific, and element-wise), this model has non-parametric properties in that it estimates the appropriate number of factors from the data. We added a two-component mixture to model each factor loading as being generated from either a sparse or a dense mixture component; this allows dense factors that capture confounding noise, and sparse factors that capture local gene interactions. We developed two…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Bioinformatics and Genomic Networks · Statistical Methods and Inference
