Optimality of Graphlet Screening in High Dimensional Variable Selection

Jiashun Jin; Cun-Hui Zhang; Qi Zhang

arXiv:1204.6452·math.ST·June 16, 2014·J. Mach. Learn. Res.

Optimality of Graphlet Screening in High Dimensional Variable Selection

Jiashun Jin, Cun-Hui Zhang, Qi Zhang

PDF

TL;DR

This paper introduces Graphlet Screening (GS), a new variable selection method for high-dimensional linear models with sparse design matrices, demonstrating its optimality in minimizing Hamming distance compared to existing methods.

Contribution

The paper proposes Graphlet Screening, a novel two-stage variable selection approach guided by the graph of strong dependence, and proves its minimax optimality in high-dimensional settings.

Findings

01

GS achieves the optimal rate of convergence in Hamming distance.

02

Traditional methods like subset selection and lasso are shown to be non-optimal.

03

GS has computational advantages over brute-force multivariate screening.

Abstract

Consider a linear regression model where the design matrix X has n rows and p columns. We assume (a) p is much large than n, (b) the coefficient vector beta is sparse in the sense that only a small fraction of its coordinates is nonzero, and (c) the Gram matrix G = X'X is sparse in the sense that each row has relatively few large coordinates (diagonals of G are normalized to 1). The sparsity in G naturally induces the sparsity of the so-called graph of strong dependence (GOSD). We find an interesting interplay between the signal sparsity and the graph sparsity, which ensures that in a broad context, the set of true signals decompose into many different small-size components of GOSD, where different components are disconnected. We propose Graphlet Screening (GS) as a new approach to variable selection, which is a two-stage Screen and Clean method. The key methodological innovation of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Regression