The Conditional Prediction Function: A Novel Technique to Control False Discovery Rate for Complex Models
Yushu Shi, Michael Martens

TL;DR
This paper introduces the Conditional Prediction Function (CPF) statistic, enhancing knockoff filtering by enabling FDR control in complex models like deep neural networks, thus improving variable selection in nonlinear and correlated data.
Contribution
The paper presents a novel CPF-based knockoff statistic that captures nonlinear relationships and correlations, extending FDR control to advanced machine learning models.
Findings
CPF statistics outperform traditional methods in simulations.
Effective in selecting relevant variables in real datasets.
Provides better power for nonlinear and correlated predictors.
Abstract
In modern scientific research, the objective is often to identify which variables are associated with an outcome among a large class of potential predictors. This goal can be achieved by selecting variables in a manner that controls the the false discovery rate (FDR), the proportion of irrelevant predictors among the selections. Knockoff filtering is a cutting-edge approach to variable selection that provides FDR control. Existing knockoff statistics frequently employ linear models to assess relationships between features and the response, but the linearity assumption is often violated in real world applications. This may result in poor power to detect truly prognostic variables. We introduce a knockoff statistic based on the conditional prediction function (CPF), which can pair with state-of-art machine learning predictive models, such as deep neural networks. The CPF statistics can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Machine Learning and Data Classification · Statistical Methods and Inference
