Feature selection in omics prediction problems using cat scores and   false nondiscovery rate control

Miika Ahdesm\"aki; Korbinian Strimmer

arXiv:0903.2003·stat.AP·October 11, 2010

Feature selection in omics prediction problems using cat scores and false nondiscovery rate control

Miika Ahdesm\"aki, Korbinian Strimmer

PDF

1 Repo

TL;DR

This paper introduces a new feature selection method for high-dimensional linear discriminant analysis using correlation-adjusted t-scores and false nondiscovery rate control, with efficient regularization and implementation in R.

Contribution

It presents a novel pooled centroids formulation with cat scores and FNDR thresholding for feature selection in correlated feature spaces, improving high-dimensional LDA.

Findings

01

Effective feature selection in high-dimensional data.

02

Computationally efficient with analytical regularization.

03

Implemented in the R package 'sda'.

Abstract

We revisit the problem of feature selection in linear discriminant analysis (LDA), that is, when features are correlated. First, we introduce a pooled centroids formulation of the multiclass LDA predictor function, in which the relative weights of Mahalanobis-transformed predictors are given by correlation-adjusted $t$ -scores (cat scores). Second, for feature selection we propose thresholding cat scores by controlling false nondiscovery rates (FNDR). Third, training of the classifier is based on James--Stein shrinkage estimates of correlations and variances, where regularization parameters are chosen analytically without resampling. Overall, this results in an effective and computationally inexpensive framework for high-dimensional prediction with natural feature selection. The proposed shrinkage discriminant procedures are implemented in the R package ``sda'' available from the R…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mjafin/shrinkage_da
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Discriminant Analysis