Recovering Direct Effects in Genetics: A Comparison
Matthew Sperrin, Thomas Jaki

TL;DR
This paper compares methods for identifying directly disease-related SNPs in genetics, highlighting the effectiveness of stability selection and direct effect testing across various correlation scenarios.
Contribution
It provides a comprehensive simulation-based comparison of existing methods for true sparsity recovery in genetic data analysis.
Findings
Lasso performs well with low predictor correlation.
Specialized methods are needed for high correlation.
Stability selection and direct effect testing are robust across scenarios.
Abstract
In genetics it is often of interest to discover single nucleotide polymorphisms (SNPs) that are directly related to a disease, rather than just being associated with it. Few methods exist, however, addressing this so-called `true sparsity recovery' issue. In a thorough simulation study, we show that for moderate or low correlation between predictors, lasso-based methods perform well at true sparsity recovery, despite not being specifically designed for this purpose. For large correlations, however, more specialised methods are needed. Stability selection and direct effect testing perform well in all situations, including when the correlation is large.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Gene expression and cancer classification · Molecular Biology Techniques and Applications
