When Features Beat Noise: A Feature Selection Technique Through Noise-Based Hypothesis Testing
Mousam Sinha, Tirtha Sarathi Ghosh, Ridam Pal

TL;DR
This paper introduces a statistically grounded feature selection method that uses noise-based hypothesis testing, outperforming existing techniques in both simulated and real-world datasets by reliably identifying informative predictors.
Contribution
The paper presents a novel feature selection approach that incorporates multiple noise features and bootstrap hypothesis testing, providing a theoretical foundation and improved performance over existing methods.
Findings
Consistently outperforms Boruta and Knockoff in recovering signals.
Demonstrates robustness across diverse real-world datasets.
Offers a principled, statistically justified feature selection framework.
Abstract
Feature selection has remained a daunting challenge in machine learning and artificial intelligence, where increasingly complex, high-dimensional datasets demand principled strategies for isolating the most informative predictors. Despite widespread adoption, many established techniques suffer from notable limitations; some incur substantial computational cost, while others offer no definite statistical driven stopping criteria or assesses the significance of their importance scores. A common heuristic approach introduces multiple random noise features and retains all predictors ranked above the strongest noise feature. Although intuitive, this strategy lacks theoretical justification and depends heavily on heuristics. This paper proposes a novel feature selection method that addresses these limitations. Our approach introduces multiple random noise features and evaluates each feature's…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Gene expression and cancer classification · Face and Expression Recognition
