Optimality of Graphlet Screening in High Dimensional Variable Selection
Jiashun Jin, Cun-Hui Zhang, Qi Zhang

TL;DR
This paper introduces Graphlet Screening (GS), a new variable selection method for high-dimensional linear models with sparse design matrices, demonstrating its optimality in minimizing Hamming distance compared to existing methods.
Contribution
The paper proposes Graphlet Screening, a novel two-stage variable selection approach guided by the graph of strong dependence, and proves its minimax optimality in high-dimensional settings.
Findings
GS achieves the optimal rate of convergence in Hamming distance.
Traditional methods like subset selection and lasso are shown to be non-optimal.
GS has computational advantages over brute-force multivariate screening.
Abstract
Consider a linear regression model where the design matrix X has n rows and p columns. We assume (a) p is much large than n, (b) the coefficient vector beta is sparse in the sense that only a small fraction of its coordinates is nonzero, and (c) the Gram matrix G = X'X is sparse in the sense that each row has relatively few large coordinates (diagonals of G are normalized to 1). The sparsity in G naturally induces the sparsity of the so-called graph of strong dependence (GOSD). We find an interesting interplay between the signal sparsity and the graph sparsity, which ensures that in a broad context, the set of true signals decompose into many different small-size components of GOSD, where different components are disconnected. We propose Graphlet Screening (GS) as a new approach to variable selection, which is a two-stage Screen and Clean method. The key methodological innovation of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Regression
