Discovering influential variables: A method of partitions

Herman Chernoff; Shaw-Hwa Lo; Tian Zheng

arXiv:1009.5744·stat.AP·September 30, 2010

Discovering influential variables: A method of partitions

Herman Chernoff, Shaw-Hwa Lo, Tian Zheng

PDF

TL;DR

This paper introduces a computational method to identify influential variables in high-dimensional data by analyzing random subsets, aiding in the discovery of key factors affecting a dependent variable.

Contribution

It presents a novel, computer-intensive approach based on subset analysis to detect influential variables, especially when effects depend on variable combinations.

Findings

01

Effective in high-dimensional, noisy data environments

02

Avoids direct analysis of all variables simultaneously

03

Focuses on locating a small set of influential variables

Abstract

A trend in all scientific disciplines, based on advances in technology, is the increasing availability of high dimensional data in which are buried important information. A current urgent challenge to statisticians is to develop effective methods of finding the useful information from the vast amounts of messy and noisy data available, most of which are noninformative. This paper presents a general computer intensive approach, based on a method pioneered by Lo and Zheng for detecting which, of many potential explanatory variables, have an influence on a dependent variable $Y$ . This approach is suited to detect influential variables, where causal effects depend on the confluence of values of several variables. It has the advantage of avoiding a difficult direct analysis, involving possibly thousands of variables, by dealing with many randomly selected small subsets from which smaller…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.