Genetic risk predictions using deep learning models with summary data
Angela Wang, Elena Xiao, Jason Cheng, Xiaoxi Shen

TL;DR
This paper shows that deep learning models can predict genetic risks using summary data, even when individual-level genetic data is not available.
Contribution
The study demonstrates that deep learning models can achieve comparable accuracy in genetic risk prediction using only summary data like linkage disequilibrium matrices.
Findings
Test mean squared errors of deep learning models using summary data are comparable to those using individual-level genetic data.
Deep learning methods can serve as an alternative for predicting disease-related traits when only linkage disequilibrium matrices are available.
Abstract
As a driving force of the Fourth Industrial Revolution, deep learning methods have achieved significant success across various fields, including genetic and genomic studies. While individual-level genetic data is ideal for deep learning models, privacy concerns and data-sharing restrictions often limit its availability to researchers. In this paper, we investigated the potential applications of deep learning models—including deep neural networks, convolutional neural networks, recurrent neural networks, and transformers—when only genetic summary data, such as linkage disequilibrium matrices, is available. The bootstrap method was used to approximate the test error. Simulation studies and real data analyses were conducted to compare the performance of deep learning methods in genetic risk prediction using individual-level genetic data versus genetic summary data. The test mean squared…
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenetic Associations and Epidemiology · Bioinformatics and Genomic Networks · Genetic Mapping and Diversity in Plants and Animals
