Deep neural networks with controlled variable selection for the identification of putative causal genetic variants
Peyman H. Kassani, Fred Lu, Yann Le Guen, Zihuai He

TL;DR
This paper introduces an interpretable, stabilized deep neural network approach with controlled variable selection for identifying causal genetic variants, enhancing interpretability, power, and computational efficiency in genetic studies.
Contribution
It proposes a novel neural network model with ensembling and knockoffs for robust, interpretable variable selection in genome sequencing, addressing randomness and false discovery issues.
Findings
Improved detection of causal variants in simulations.
More discoveries in Alzheimer disease genetics.
Enhanced interpretability and stability of variable selection.
Abstract
Deep neural networks (DNN) have been used successfully in many scientific problems for their high prediction accuracy, but their application to genetic studies remains challenging due to their poor interpretability. In this paper, we consider the problem of scalable, robust variable selection in DNN for the identification of putative causal genetic variants in genome sequencing studies. We identified a pronounced randomness in feature selection in DNN due to its stochastic nature, which may hinder interpretability and give rise to misleading results. We propose an interpretable neural network model, stabilized using ensembling, with controlled variable selection for genetic studies. The merit of the proposed method includes: (1) flexible modelling of the non-linear effect of genetic variants to improve statistical power; (2) multiple knockoffs in the input layer to rigorously control…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Machine Learning in Bioinformatics · Genetic Associations and Epidemiology
MethodsFeature Selection
