Genome-wide association studies with high-dimensional phenotypes
Pekka Marttinen, Jussi Gillberg, Aki Havulinna, Jukka Corander, and, Samuel Kaski

TL;DR
This paper compares methods for genome-wide association studies with high-dimensional phenotypes, demonstrating that canonical correlation analysis offers higher power and computational feasibility, especially with sufficient sample sizes.
Contribution
It provides a systematic comparison of methods for high-dimensional phenotype GWAS, introducing a block-based testing approach and evaluating their performance on simulated and real data.
Findings
Canonical correlation analysis outperforms other methods in power
Block-based testing reduces computational burden
Sparse methods perform well with small sample sizes
Abstract
High-dimensional phenotypes hold promise for richer findings in association studies, but testing of several phenotype traits aggravates the grand challenge of association studies, that of multiple testing. Several methods have recently been proposed for testing jointly all traits in a high-dimensional vector of phenotypes, with prospect of increased power to detect small effects that would be missed if tested individually. However, the methods have rarely been compared to the extent of enabling assessment of their relative merits and setting up guidelines on which method to use, and how to use it. We compare the methods on simulated data and with a real metabolomics data set comprising 137 highly correlated variables and approximately 550,000 SNPs. Applying the methods to genome-wide data with hundreds of thousands of markers inevitably requires division of the problem into manageable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Genetic Associations and Epidemiology · Bioinformatics and Genomic Networks
