Statistical significance of variables driving systematic variation
Neo Christopher Chung, John D. Storey

TL;DR
The paper introduces the jackstraw method, a statistical approach for accurately identifying genomic variables significantly associated with principal components, addressing over-fitting issues in large-scale genomic data analysis.
Contribution
The paper presents the jackstraw method, a novel technique for significance testing of variables associated with PCA components in genomics, improving accuracy over traditional methods.
Findings
The jackstraw method provides accurate significance measures in simulations.
It identifies cell-cycle regulated genes in yeast data.
It reveals inflammatory gene enrichment in post-trauma gene expression data.
Abstract
There are a number of well-established methods such as principal components analysis (PCA) for automatically capturing systematic variation due to latent variables in large-scale genomic data. PCA and related methods may directly provide a quantitative characterization of a complex biological variable that is otherwise difficult to precisely define or model. An unsolved problem in this context is how to systematically identify the genomic variables that are drivers of systematic variation captured by PCA. Principal components (and other estimates of systematic variation) are directly constructed from the genomic variables themselves, making measures of statistical significance artificially inflated when using conventional methods due to over-fitting. We introduce a new approach called the jackstraw that allows one to accurately identify genomic variables that are statistically…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
