TL;DR
This paper introduces a novel model-free variable screening method based on leverage scores, utilizing singular value decomposition to efficiently select relevant predictors in large datasets, applicable to linear and complex models.
Contribution
It proposes a weighted leverage score-based screening technique that effectively identifies true predictors, extending leverage score methods to variable selection in high-dimensional data.
Findings
Method is computationally efficient and scalable.
Successfully identifies relevant variables in simulated data.
Demonstrates effectiveness in gene identification from spatial transcriptome data.
Abstract
With rapid advances in information technology, massive datasets are collected in all fields of science, such as biology, chemistry, and social science. Useful or meaningful information is extracted from these data often through statistical learning or model fitting. In massive datasets, both sample size and number of predictors can be large, in which case conventional methods face computational challenges. Recently, an innovative and effective sampling scheme based on leverage scores via singular value decompositions has been proposed to select rows of a design matrix as a surrogate of the full data in linear regression. Analogously, variable screening can be viewed as selecting rows of the design matrix. However, effective variable selection along this line of thinking remains elusive. In this article, we bridge this gap to propose a weighted leverage variable screening method by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
