Structural randomised selection
Fan Wang, Sylvia Richardson, Steven M. Hill

TL;DR
STRANDS is an ensemble method that enhances sparse penalised regression for high-dimensional data by using correlation-informed subsampling and variable importance, leading to improved variable selection and model performance.
Contribution
It introduces a novel two-step subsampling ensemble approach that leverages correlation structure and variable importance to improve sparse regression methods.
Findings
STRANDS outperforms standard sparse regression methods on synthetic and biological data.
Incorporating correlation structure in subsampling improves model exploration efficiency.
STRANDS is compatible with any sparse penalised regression approach.
Abstract
An important problem in the analysis of high-dimensional omics data is to identify subsets of molecular variables that are associated with a phenotype of interest. This requires addressing the challenges of high dimensionality, strong multicollinearity and model uncertainty. We propose a new ensemble learning approach for improving the performance of sparse penalised regression methods, called STructural RANDomised Selection (STRANDS). The approach, that builds and improves upon the Random Lasso method, consists of two steps. In both steps, we reduce dimensionality by repeated subsampling of variables. We apply a penalised regression method to each subsampled dataset and average the results. In the first step, subsampling is informed by variable correlation structure, and in the second step, by variable importance measures from the first step. STRANDS can be used with any sparse…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Metabolomics and Mass Spectrometry Studies · Bioinformatics and Genomic Networks
