Variable selection in high-dimensional linear models: partially faithful   distributions and the PC-simple algorithm

Peter B\"uhlmann; Markus Kalisch; Marloes H. Maathuis

arXiv:0906.3204·stat.ME·January 12, 2012

Variable selection in high-dimensional linear models: partially faithful distributions and the PC-simple algorithm

Peter B\"uhlmann, Markus Kalisch, Marloes H. Maathuis

PDF

TL;DR

This paper introduces the PC-simple algorithm for variable selection in high-dimensional linear models, leveraging the new concept of partial faithfulness to efficiently identify relevant covariates even with thousands of variables.

Contribution

It proposes a novel, computationally feasible variable selection method based on partial faithfulness, differing from penalty-based approaches like Lasso, with theoretical guarantees.

Findings

01

The PC-simple algorithm is computationally efficient for high-dimensional data.

02

It achieves consistent variable selection under new conditions on the design matrix.

03

The method performs competitively with existing penalty-based approaches in simulations and real data.

Abstract

We consider variable selection in high-dimensional linear models where the number of covariates greatly exceeds the sample size. We introduce the new concept of partial faithfulness and use it to infer associations between the covariates and the response. Under partial faithfulness, we develop a simplified version of the PC algorithm (Spirtes et al., 2000), the PC-simple algorithm, which is computationally feasible even with thousands of covariates and provides consistent variable selection under conditions on the random design matrix that are of a different nature than coherence conditions for penalty-based approaches like the Lasso. Simulations and application to real data show that our method is competitive compared to penalty-based approaches. We provide an efficient implementation of the algorithm in the R-package pcalg.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.