Effect Size Estimation and Misclassification Rate Based Variable Selection in Linear Discriminant Analysis
Bernd Klaus

TL;DR
This paper introduces a simple, efficient effect size estimation method for variable selection in linear discriminant analysis, improving interpretability and accuracy in biological sample classification.
Contribution
It proposes a new effect size estimation approach and demonstrates its effectiveness for variable selection based on misclassification rate in LDA.
Findings
Methods produce compact, interpretable feature sets
Competitive performance in simulations and real data
Efficient computation with improved interpretability
Abstract
Supervised classifying of biological samples based on genetic information, (e.g. gene expression profiles) is an important problem in biostatistics. In order to find both accurate and interpretable classification rules variable selection is indispensable. This article explores how an assessment of the individual importance of variables (effect size estimation) can be used to perform variable selection. I review recent effect size estimation approaches in the context of linear discriminant analysis (LDA) and propose a new conceptually simple effect size estimation method which is at the same time computationally efficient. I then show how to use effect sizes to perform variable selection based on the misclassification rate which is the data independent expectation of the prediction error. Simulation studies and real data analyses illustrate that the proposed effect size estimation and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Data Mining Algorithms and Applications
