# Sparse polygenic risk score inference with the spike-and-slab LASSO

**Authors:** Junyi Song, Shadi Zabad, Archer Yang, Simon Gravel, Yue Li

PMC · DOI: 10.1093/bioinformatics/btaf578 · Bioinformatics · 2025-10-17

## TL;DR

This paper introduces SSLPRS, a new method for predicting disease risk from genetic data that improves accuracy and efficiency compared to existing approaches.

## Contribution

SSLPRS bridges Bayesian and penalized regression methods using the Spike-and-Slab LASSO for sparse polygenic risk score inference.

## Key findings

- SSLPRS shows up to 50% improvement in positive predictive value in simulations.
- Selected variants are enriched for meaningful genomic annotations in real data.
- SSLPRS outperforms existing methods in variable selection for sparse genetic architectures.

## Abstract

Large-scale biobanks, with rich phenotypic and genomic data across hundreds of thousands of samples, provide ample opportunities to elucidate the genetics of complex traits and diseases. Consequently, there is growing demand for robust and scalable methods for disease risk prediction from genotype data. Inference in this setting is challenging due to the high-dimensionality of genomic data, especially when coupled with smaller sample sizes. Popular Polygenic Risk Score (PRS) inference methods address this challenge by adopting sparse Bayesian priors or penalized regression techniques, such as the Least Absolute Shrinkage and Selection Operator (LASSO). However, the former class of methods are not as scalable and do not produce exact sparsity, while the latter tends to over-shrink large coefficients.

In this study, we present SSLPRS, a novel PRS method based on the Spike-and-Slab LASSO (SSL) prior, which offers a theoretical bridge between the two frameworks. We extend previous work to derive a coordinate-ascent inference algorithm that operates on GWAS summary statistics, which is orders-of-magnitude more efficient than corresponding individual-level-based implementations. To illustrate the statistical properties of the proposed model, we conducted experiments involving nine simulation configurations and nine quantitative phenotypes from the UK Biobank. Our results demonstrate that SSLPRS is competitive with state-of-the-art methods in terms of prediction accuracy and exhibits superior variable selection performance, especially in sparse genetic architectures. In simulations, this translates to upwards of 50% improvement in positive predictive value. In analysis of real phenotypes, we show that selected variants are highly enriched for meaningful genomic annotations and have better replication rates in larger meta-analyses.

SSLPRS is available in the open-source package https://github.com/li-lab-mcgill/penprs.

## Full-text entities

- **Diseases:** SSL (MESH:D031261)
- **Chemicals:** lipid (MESH:D008055)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12596729/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12596729/full.md

## References

28 references — full list in the complete paper: https://tomesphere.com/paper/PMC12596729/full.md

---
Source: https://tomesphere.com/paper/PMC12596729