Direct estimation and inference of higher-level correlations from lower-level measurements with applications in gene-pathway and proteomics studies
Yue Wang, Haoran Shi

TL;DR
This paper introduces a latent factor model that directly estimates higher-level correlations from lower-level biological measurements, avoiding data aggregation and improving accuracy in gene-pathway and proteomics studies.
Contribution
A novel latent factor approach for direct estimation of higher-level correlations from lower-level data, with a shrinkage estimator and asymptotic normality for inference.
Findings
Accurate estimation of higher-level correlations demonstrated in simulations.
Effective application to proteomics and gene expression datasets.
R package highcor implemented for practical use.
Abstract
This paper tackles the challenge of estimating correlations between higher-level biological variables (e.g., proteins and gene pathways) when only lower-level measurements are directly observed (e.g., peptides and individual genes). Existing methods typically aggregate lower-level data into higher-level variables and then estimate correlations based on the aggregated data. However, different data aggregation methods can yield varying correlation estimates as they target different higher-level quantities. Our solution is a latent factor model that directly estimates these higher-level correlations from lower-level data without the need for data aggregation. We further introduce a shrinkage estimator to ensure the positive definiteness and improve the accuracy of the estimated correlation matrix. Furthermore, we establish the asymptotic normality of our estimator, enabling efficient…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification
