Overlapping group logistic regression with applications to genetic pathway selection
Yaohui Zeng, Patrick Breheny

TL;DR
This paper introduces grpregOverlap, an extension of the grpreg package, enabling regression with overlapping pathway groups to improve gene selection accuracy in genomewide expression analysis.
Contribution
It develops a novel method for overlapping pathway-based regression, filling a gap in software tools for pathway-informed gene analysis.
Findings
Incorporating pathway information improves gene expression classifier accuracy.
The approach outperforms ordinary lasso and GSEA in simulations and real data.
The study clarifies differences between hypothesis-testing and regression methods for pathway analysis.
Abstract
Discovering important genes that account for the phenotype of interest has long been challenging in genomewide expression analysis. Analyses such as Gene Set Enrichment Analysis (GSEA) that incorporate pathway information have become widespread in hypothesis testing, but pathway-based approaches have been largely absent from regression methods due to the challenges of dealing with overlapping pathways and the resulting lack of available software. The R package grpreg is widely used to fit group lasso and other group-penalized regression models; in this study, we develop an extension, grpregOverlap, to allow for overlapping group structure using the latent variable approach proposed by Jacob et al. (2009). We compare this approach to the ordinary lasso and to GSEA using both simulated and real data. We find that incorporation of prior pathway information substantially improves the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
