Causality and surrogate variable analysis
Emiliano Diaz

TL;DR
This paper introduces a causal modeling framework for surrogate variable analysis (SVA) in gene expression data, clarifying its assumptions, methodology, and implementation to improve understanding of unobserved factors affecting gene expression.
Contribution
It defines a class of additive gene expression SEMs that underpin SVA, providing a causal and modeling justification for the methodology and its implementation in R.
Findings
Provides a causal interpretation of SVA
Defines additive gene expression SEMs for modeling
Details the SVA methodology and R implementation
Abstract
Gene expression depends on thousands of factors and we usually only have access to tens or hundreds of observations of gene expression levels meaning we are in a high-dimensional setting. Additionally we don't always observe or care about all the factors. However, many different gene expression levels depend on a set of common factors. By observing the joint variance of the gene expression levels together with the observed primary variables (those we care about) Surrogate Variable Analysis (SVA) seeks to estimate the remaining unobserved factors. The ultimate goal is to assess whether the primary variable (or vector) has a significant effect on the different gene expression levels, but without estimating unobserved factors first the various regression models and hypothesis tests are dependent which complicates significance analysis. In this work we define a class of additive gene…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene Regulatory Network Analysis · Gene expression and cancer classification · Evolutionary Algorithms and Applications
