When Knockoffs fail: diagnosing and fixing non-exchangeability of Knockoffs
Alexandre Blain, Angel Reyero Lobo, Julia Linhart, Bertrand Thirion,, Pierre Neuvial

TL;DR
This paper identifies limitations of the exchangeability assumption in knockoff methods, introduces a diagnostic tool to detect violations, and proposes an alternative construction that restores reliable variable selection in high-dimensional data.
Contribution
It introduces a diagnostic test for knockoff exchangeability violations and proposes an improved knockoff construction method that maintains statistical control.
Findings
Violations of exchangeability cause false positive inflation.
The diagnostic tool effectively detects exchangeability violations.
The proposed method restores error control in high-dimensional settings.
Abstract
Knockoffs are a popular statistical framework that addresses the challenging problem of conditional variable selection in high-dimensional settings with statistical control. Such statistical control is essential for the reliability of inference. However, knockoff guarantees rely on an exchangeability assumption that is difficult to test in practice, and there is little discussion in the literature on how to deal with unfulfilled hypotheses. This assumption is related to the ability to generate data similar to the observed data. To maintain reliable inference, we introduce a diagnostic tool based on Classifier Two-Sample Tests. Using simulations and real data, we show that violations of this assumption occur in common settings for classical knockoff generators, especially when the data have a strong dependence structure. As a consequence, knockoff-based inference suffers from a massive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Gaussian Processes and Bayesian Inference · Gene expression and cancer classification
