A Two-Stage Variable Selection Approach for Correlated High Dimensional   Predictors

Zhiyuan Li

arXiv:2103.13357·stat.ME·March 25, 2021

A Two-Stage Variable Selection Approach for Correlated High Dimensional Predictors

Zhiyuan Li

PDF

Open Access

TL;DR

This paper introduces a two-stage variable selection method for high-dimensional, correlated predictors, combining data-driven clustering with existing group selection techniques to improve accuracy and prediction performance.

Contribution

It proposes a novel two-stage approach that automatically identifies predictor groups and enhances selection accuracy, especially in ultrahigh dimensional settings.

Findings

01

Two-stage method outperforms existing methods in simulation studies.

02

Improved prediction accuracy and active predictor selection.

03

Effective in ultrahigh dimensional data with variable screening.

Abstract

When fitting statistical models, some predictors are often found to be correlated with each other, and functioning together. Many group variable selection methods are developed to select the groups of predictors that are closely related to the continuous or categorical response. These existing methods usually assume the group structures are well known. For example, variables with similar practical meaning, or dummy variables created by categorical data. However, in practice, it is impractical to know the exact group structure, especially when the variable dimensional is large. As a result, the group variable selection results may be selected. To solve the challenge, we propose a two-stage approach that combines a variable clustering stage and a group variable stage for the group variable selection problem. The variable clustering stage uses information from the data to find a group…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference · Advanced Statistical Methods and Models · Statistical Methods and Bayesian Inference