Testing for Feature Relevance: The HARVEST Algorithm
Herbert Weisberg, Victor Pontes, and Mathis Thoma

TL;DR
The HARVEST algorithm is a new, computationally intensive method for feature pre-screening in high-dimensional data, effectively identifying relevant features by testing their predictive value across random feature subsets.
Contribution
Introduces the HARVEST algorithm, a novel feature relevance testing method that improves feature selection in high-dimensional, low-relevance scenarios.
Findings
HARVEST effectively identifies relevant features in high-dimensional data.
The algorithm performs well in predictive analytics for science and business.
Empirical results demonstrate high effectiveness of HARVEST.
Abstract
Feature selection with high-dimensional data and a very small proportion of relevant features poses a severe challenge to standard statistical methods. We have developed a new approach (HARVEST) that is straightforward to apply, albeit somewhat computer-intensive. This algorithm can be used to pre-screen a large number of features to identify those that are potentially useful. The basic idea is to evaluate each feature in the context of many random subsets of other features. HARVEST is predicated on the assumption that an irrelevant feature can add no real predictive value, regardless of which other features are included in the subset. Motivated by this idea, we have derived a simple statistical test for feature relevance. Empirical analyses and simulations produced so far indicate that the HARVEST algorithm is highly effective in predictive analytics, both in science and business.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Face and Expression Recognition · Machine Learning and Data Classification
