Evaluating unsupervised disentangled representation learning for genomic discovery and disease risk prediction
Taedong Yun

TL;DR
This paper evaluates various unsupervised disentangled representation learning methods, such as autoencoders and VAEs, for genetic discovery and disease risk prediction using clinical data from UK Biobank.
Contribution
It compares the effectiveness of different disentangled learning methods in improving genetic association and disease prediction, highlighting the benefits of FactorVAE and beta-VAE.
Findings
FactorVAE and beta-VAE improve detection of significant genetic loci.
Disentangled representations enhance polygenic risk score performance.
FactorVAE is robust across hyperparameters, unlike beta-VAE.
Abstract
High-dimensional clinical data have become invaluable resources for genetic studies, due to their accessibility in biobank-scale datasets and the development of high performance modeling techniques especially using deep learning. Recent work has shown that low dimensional embeddings of these clinical data learned by variational autoencoders (VAE) can be used for genome-wide association studies and polygenic risk prediction. In this work, we consider multiple unsupervised learning methods for learning disentangled representations, namely autoencoders, VAE, beta-VAE, and FactorVAE, in the context of genetic association studies. Using spirograms from UK Biobank as a running example, we observed improvements in the number of genome-wide significant loci, heritability, and performance of polygenic risk scores for asthma and chronic obstructive pulmonary disease by using FactorVAE or…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenetic Associations and Epidemiology · Cancer-related molecular mechanisms research · Bioinformatics and Genomic Networks
