Parsimonious Subset Selection for Generalized Linear Models with Biomedical Applications
Anant Mathur, Benoit Liquet, Samuel Muller, Sarat Moka

TL;DR
This paper introduces COMBSS-GLM, a scalable method for selecting sparse, accurate, and interpretable generalized linear models in high-dimensional biomedical data, outperforming existing methods in variable selection and prediction.
Contribution
The paper develops a novel continuous relaxation approach combined with a Frank--Wolfe algorithm for best subset selection in GLMs, with theoretical guarantees and practical effectiveness.
Findings
Improves variable selection accuracy over penalized likelihood methods.
Achieves perfect classification in a cancer dataset with few genes.
Recovers known genetic loci in a GWAS study.
Abstract
High-dimensional biomedical studies require models that are simultaneously accurate, sparse, and interpretable, yet exact best subset selection for generalized linear models is computationally intractable. We develop a scalable method that combines a continuous Boolean relaxation of the subset problem with a Frank--Wolfe algorithm driven by envelope gradients. The resulting method, which we refer to as COMBSS-GLM, is simple to implement, requires one penalized generalized linear model fit per iteration, and produces sparse models along a model-size path. Theoretically, we identify a curvature-based parameter regime in which the relaxed objective is concave in the selection weights, implying that global minimizers occur at binary corners. Empirically, in logistic and multinomial simulations across low- and high-dimensional correlated settings, the proposed method consistently improves…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Genetic Associations and Epidemiology · Tensor decomposition and applications
