Semi-knockoffs: a model-agnostic conditional independence testing method with finite-sample guarantees
Angel Reyero-Lobo, Bertrand Thirion, Pierre Neuvial

TL;DR
Semi-knockoffs is a flexible, model-agnostic conditional independence testing method that avoids data splitting, provides valid p-values and FDR control, and extends applicability to high-dimensional, real-world data scenarios.
Contribution
It introduces Semi-knockoffs, a novel CIT approach that accommodates any pre-trained model without data splitting and requires only conditional expectations for continuous variables.
Findings
Provides valid p-values and FDR control in high-dimensional settings.
Ensures validity with new theoretical results on stability and double-robustness.
Applicable to any pre-trained model, enhancing practical utility.
Abstract
Conditional independence testing (CIT) is essential for reliable scientific discovery. It prevents spurious findings and enables controlled feature selection. Recent CIT methods have used machine learning (ML) models as surrogates of the underlying distribution. However, model-agnostic approaches require a train-test split, which reduces statistical power. We introduce Semi-knockoffs, a CIT method that can accommodate any pre-trained model, avoids this split, and provides valid p-values and false discovery rate (FDR) control for high-dimensional settings. Unlike methods that rely on the model- assumption (known input distribution), Semi-knockoffs only require conditional expectations for continuous variables. This makes the procedure less restrictive and more practical for machine learning integration. To ensure validity when estimating these expectations, we present two new…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Adversarial Robustness in Machine Learning · Bayesian Modeling and Causal Inference
