Feature selection by Higher Criticism thresholding: optimal phase   diagram

David Donoho; Jiashun Jin

arXiv:0812.2263·math.ST·May 13, 2015

Feature selection by Higher Criticism thresholding: optimal phase diagram

David Donoho, Jiashun Jin

PDF

TL;DR

This paper analyzes the effectiveness of Higher Criticism thresholding for feature selection in high-dimensional, low-sample size classification, showing it performs nearly optimally across a broad phase space of rare and weak features.

Contribution

The paper formalizes an asymptotic framework for the rare/weak model, demonstrating that Higher Criticism thresholding nearly matches the ideal threshold's performance and characterizes its success regions in phase space.

Findings

01

HCT performs nearly as well as the ideal threshold asymptotically.

02

The phase space partition for HCT success matches that of the ideal threshold.

03

HCT outperforms FDR thresholding in the rare/weak feature regime.

Abstract

We consider two-class linear classification in a high-dimensional, low-sample size setting. Only a small fraction of the features are useful, the useful features are unknown to us, and each useful feature contributes weakly to the classification decision -- this setting was called the rare/weak model (RW Model). We select features by thresholding feature $z$ -scores. The threshold is set by {\it higher criticism} (HC). Let $\pee_{i}$ denote the $P$ -value associated to the $i$ -th $z$ -score and $\pee_{(i)}$ denote the $i$ -th order statistic of the collection of $P$ -values. The HC threshold (HCT) is the order statistic of the $z$ -score corresponding to index $i$ maximizing $(i / n - \pee_{(i)}) / \pee_{(i)} (1 - \pee_{(i)})$ . The ideal threshold optimizes the classification error. In \cite{PNAS} we showed that HCT was numerically close to the ideal threshold. We formalize an asymptotic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.