Controlling Costs: Feature Selection on a Budget
Guo Yu, Daniela Witten, Jacob Bien

TL;DR
This paper introduces a cost-aware feature selection method called cheap knockoffs, which balances model accuracy with feature measurement costs, providing theoretical guarantees on cost efficiency and demonstrating practical benefits in simulations and biomedical data.
Contribution
The paper proposes a novel cost-sensitive feature selection procedure with theoretical bounds, addressing the tradeoff between feature cost and importance, and validating its effectiveness through simulations and real data.
Findings
The method controls the fraction of feature cost wasted on unimportant features.
Theoretical bounds hold with high probability over increasing feature set sizes.
Practical improvements shown in biomedical application and simulations.
Abstract
The traditional framework for feature selection treats all features as costing the same amount. However, in reality, a scientist often has considerable discretion regarding which variables to measure, and the decision involves a tradeoff between model accuracy and cost (where cost can refer to money, time, difficulty, or intrusiveness). In particular, unnecessarily including an expensive feature in a model is worse than unnecessarily including a cheap feature. We propose a procedure, which we call cheap knockoffs, for performing feature selection in a cost-conscious manner. The key idea behind our method is to force higher cost features to compete with more knockoffs than cheaper features. We derive an upper bound on the weighted false discovery proportion associated with this procedure, which corresponds to the fraction of the feature cost that is wasted on unimportant features. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods in Clinical Trials · Statistical Methods and Inference · Advanced Statistical Process Monitoring
