# Relaxing the Assumptions of Knockoffs by Conditioning

**Authors:** Dongming Huang, Lucas Janson

arXiv: 1903.02806 · 2020-06-16

## TL;DR

This paper extends the model-X knockoffs method by relaxing the assumption of knowing the exact covariate distribution, instead allowing for a parametric model with many parameters, while maintaining false discovery rate control.

## Contribution

It shows that knockoffs guarantees hold when the covariate distribution is known only up to a parametric model, using conditioning on sufficient statistics.

## Key findings

- Maintains FDR control under weaker assumptions.
- Effective in Gaussian models with conditioning on sufficient statistics.
- Simulations demonstrate robustness of the new approach.

## Abstract

The recent paper Cand\`es et al. (2018) introduced model-X knockoffs, a method for variable selection that provably and non-asymptotically controls the false discovery rate with no restrictions or assumptions on the dimensionality of the data or the conditional distribution of the response given the covariates. The one requirement for the procedure is that the covariate samples are drawn independently and identically from a precisely-known (but arbitrary) distribution. The present paper shows that the exact same guarantees can be made without knowing the covariate distribution fully, but instead knowing it only up to a parametric model with as many as $\Omega(n^{*}p)$ parameters, where $p$ is the dimension and $n^{*}$ is the number of covariate samples (which may exceed the usual sample size $n$ of labeled samples when unlabeled samples are also available). The key is to treat the covariates as if they are drawn conditionally on their observed value for a sufficient statistic of the model. Although this idea is simple, even in Gaussian models conditioning on a sufficient statistic leads to a distribution supported on a set of zero Lebesgue measure, requiring techniques from topological measure theory to establish valid algorithms. We demonstrate how to do this for three models of interest, with simulations showing the new approach remains powerful under the weaker assumptions.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1903.02806/full.md

## Figures

36 figures with captions in the complete paper: https://tomesphere.com/paper/1903.02806/full.md

## References

42 references — full list in the complete paper: https://tomesphere.com/paper/1903.02806/full.md

---
Source: https://tomesphere.com/paper/1903.02806