Quantile Based Variable Mining : Detection, FDR based Extraction and Interpretation
S. Mukhopadhyay, Emanuel Parzen, S. N. Lahiri

TL;DR
This paper introduces a unified distributional framework for high-dimensional variable selection in classification, utilizing CR-statistics and FDR-based thresholding to improve detection, extraction, and interpretation of important variables.
Contribution
It develops a novel quantile-based comparison analysis approach and the CDfdr algorithm for adaptive thresholding, advancing variable selection methodology.
Findings
Effective detection of important variables demonstrated on real datasets
Unified approach improves interpretability and selection accuracy
New FDR thresholding method enhances variable extraction
Abstract
This paper outlines a unified framework for high dimensional variable selection for classification problems. Traditional approaches to finding interesting variables mostly utilize only partial information through moments (like mean difference). On the contrary, in this paper we address the question of variable selection in full generality from a distributional point of view. If a variable is not important for classification, then it will have similar distributional aspect under different classes. This simple and straightforward observation motivates us to quantify `How and Why' the distribution of a variable changes over classes through CR-statistic. The second contribution of our paper is to develop and investigate the FDR based thresholding technology from a completely new point of view for adaptive thresholding, which leads to a elegant algorithm called CDfdr. This paper attempts to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Neural Networks and Applications · Advanced Statistical Methods and Models
