Univariate-Guided Sparse Regression for Biobank-Scale High-Dimensional Omics Data
Joshua Richland, Tuomo Kiiskinen, William Wang, Sophia Lu, Balasubramanian Narasimhan, Trevor Hastie, Manuel Rivas, Robert Tibshirani

TL;DR
This paper introduces uniLasso, a scalable and interpretable sparse regression method for high-dimensional genomic data, demonstrating its effectiveness in predicting polygenic risk scores in the UK Biobank with fewer variants and comparable or better accuracy.
Contribution
The paper adapts uniLasso for large-scale biobank data, incorporating external summary statistics to improve prediction and interpretability of polygenic risk scores.
Findings
uniLasso achieves similar predictive performance to Lasso with fewer variants.
It outperforms competitors like PRS-CS in estimating PRS.
Incorporating external scores enhances prediction accuracy and sparsity.
Abstract
We present a scalable framework for computing polygenic risk scores (PRS) in high-dimensional genomic settings using the recently introduced Univariate-Guided Sparse Regression (uniLasso). UniLasso is a two-stage penalized regression procedure that leverages univariate coefficients and magnitudes to stabilize feature selection and enhance interpretability. Building on its theoretical and empirical advantages, we adapt uniLasso for application to the UK Biobank, a population-based repository comprising over one million genetic variants measured on hundreds of thousands of individuals from the United Kingdom. We further extend the framework to incorporate external summary statistics to increase predictive accuracy. Our results demonstrate that uniLasso attains predictive performance comparable to standard Lasso while selecting substantially fewer variants, yielding sparser and more…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenetic Associations and Epidemiology · Statistical Methods and Inference · Gene expression and cancer classification
