Variable Importance Assessments and Backward Variable Selection for   High-Dimensional Data

Liuhua Peng; Long Qu; Dan Nettleton

arXiv:1806.06468·stat.ME·June 19, 2018

Variable Importance Assessments and Backward Variable Selection for High-Dimensional Data

Liuhua Peng, Long Qu, Dan Nettleton

PDF

Open Access

TL;DR

This paper introduces a novel distance-based variable importance measure and a backward selection algorithm for high-dimensional data, improving variable selection accuracy in genomic analysis and similar fields.

Contribution

It proposes a new variable importance assessment inspired by MRPP and a backward selection method tailored for high-dimensional variable selection.

Findings

01

Effective in identifying important variables in high-dimensional data

02

Outperforms existing methods in simulations and real data

03

Demonstrates good properties and advantages over other approaches

Abstract

Variable selection in high-dimensional scenarios is of great interested in statistics. One application involves identifying differentially expressed genes in genomic analysis. Existing methods for addressing this problem have some limits or disadvantages. In this paper, we propose distance based variable importance measures to deal with these problems, which is inspired by the Multi-Response Permutation Procedure (MRPP). The proposed variable importance assessments can effectively measure the importance of an individual dimension by quantifying its influence on the differences between multivariate distributions. A backward selection algorithm is developed that can be used in high-dimensional variable selection to discover important variables. Both simulations and real data applications demonstrate that our proposed method enjoys good properties and has advantages over other methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGene expression and cancer classification · Statistical Methods and Inference · Bayesian Methods and Mixture Models