Asymptotically Optimal Knockoff Statistics via the Masked Likelihood Ratio
Asher Spector, William Fithian

TL;DR
This paper introduces masked likelihood ratio (MLR) statistics for feature selection with knockoffs, which asymptotically maximize discoveries under a Bayesian model while maintaining false positive control, outperforming existing methods.
Contribution
The paper proposes MLR statistics that asymptotically optimize power in knockoff-based feature selection without restrictive assumptions.
Findings
MLR statistics outperform state-of-the-art feature statistics in simulations.
MLR maintains error control while increasing discovery rates.
Implementation in Python package knockpy is efficient and often faster than lasso cross-validation.
Abstract
In feature selection problems, knockoffs are synthetic controls for the original features. Employing knockoffs allows analysts to use nearly any variable importance measure or "feature statistic" to select features while rigorously controlling false positives. However, it is not clear which statistic maximizes power. In this paper, we argue that state-of-the-art lasso-based feature statistics often prioritize features that are unlikely to be discovered, leading to low power in real applications. Instead, we introduce masked likelihood ratio (MLR) statistics, which prioritize features according to one's ability to distinguish each feature from its knockoff. Although no single feature statistic is uniformly most powerful in all situations, we show that MLR statistics asymptotically maximize the number of discoveries under a user-specified Bayesian model of the data. (Like all feature…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Data Analysis with R · Statistical Methods and Inference
