Effect Size Estimation and Misclassification Rate Based Variable   Selection in Linear Discriminant Analysis

Bernd Klaus

arXiv:1205.6653·stat.ME·August 9, 2012·2 cites

Effect Size Estimation and Misclassification Rate Based Variable Selection in Linear Discriminant Analysis

Bernd Klaus

PDF

Open Access

TL;DR

This paper introduces a simple, efficient effect size estimation method for variable selection in linear discriminant analysis, improving interpretability and accuracy in biological sample classification.

Contribution

It proposes a new effect size estimation approach and demonstrates its effectiveness for variable selection based on misclassification rate in LDA.

Findings

01

Methods produce compact, interpretable feature sets

02

Competitive performance in simulations and real data

03

Efficient computation with improved interpretability

Abstract

Supervised classifying of biological samples based on genetic information, (e.g. gene expression profiles) is an important problem in biostatistics. In order to find both accurate and interpretable classification rules variable selection is indispensable. This article explores how an assessment of the individual importance of variables (effect size estimation) can be used to perform variable selection. I review recent effect size estimation approaches in the context of linear discriminant analysis (LDA) and propose a new conceptually simple effect size estimation method which is at the same time computationally efficient. I then show how to use effect sizes to perform variable selection based on the misclassification rate which is the data independent expectation of the prediction error. Simulation studies and real data analyses illustrate that the proposed effect size estimation and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGene expression and cancer classification · Data Mining Algorithms and Applications