An entropic feature selection method in perspective of Turing formula
Jingyi Shi, Jialin Zhang, Yaorong Ge

TL;DR
This paper introduces a novel feature selection method for healthcare data that improves efficiency and automatic feature number determination using an entropy-based approach, especially effective with small samples.
Contribution
It proposes a new CASMI-based feature selection method that handles redundancy and automatically determines the number of features, improving performance on small healthcare datasets.
Findings
Outperforms six existing methods in Information Recovery Ratio
More effective with small sample sizes
Handles feature redundancy through joint-distribution analysis
Abstract
Health data are generally complex in type and small in sample size. Such domain-specific challenges make it difficult to capture information reliably and contribute further to the issue of generalization. To assist the analytics of healthcare datasets, we develop a feature selection method based on the concept of Coverage Adjusted Standardized Mutual Information (CASMI). The main advantages of the proposed method are: 1) it selects features more efficiently with the help of an improved entropy estimator, particularly when the sample size is small, and 2) it automatically learns the number of features to be selected based on the information from sample data. Additionally, the proposed method handles feature redundancy from the perspective of joint-distribution. The proposed method focuses on non-ordinal data, while it works with numerical data with an appropriate binning method. A…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Face and Expression Recognition · Gene expression and cancer classification
