Resampling-Based Multisplit Inference for High-Dimensional Regression
Anna Vesely, Jelle J. Goeman, Livio Finos

TL;DR
This paper introduces a flexible resampling-based testing method for high-dimensional linear regression that enables confidence statements on predictor variables, scalable to large datasets with various selection techniques.
Contribution
It presents a novel, scalable resampling-based inference method for high-dimensional regression that can be integrated with multiple testing procedures for confidence statements.
Findings
Method performs well in simulations.
Effective on real gene expression data.
Offers both exact and approximate scalable procedures.
Abstract
We propose a novel resampling-based method to construct an asymptotically exact test for any subset of hypotheses on coefficients in high-dimensional linear regression. It can be embedded into any multiple testing procedure to make confidence statements on relevant predictor variables. The method constructs permutation test statistics for any individual hypothesis by means of repeated splits of the data and a variable selection technique; then it defines a test for any subset by suitably aggregating its variables' test statistics. The resulting procedure is extremely flexible, as it allows different selection techniques and several combining functions. We present it in two ways: an exact method and an approximate one, that requires less memory usage and shorter computation time, and can be scaled up to higher dimensions. We illustrate the performance of the method with simulations and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Statistical Methods and Inference · Gene Regulatory Network Analysis
