FANOK: Knockoffs in Linear Time

Armin Askari; Quentin Rebjock; Alexandre d'Aspremont; Laurent El; Ghaoui

arXiv:2006.08790·cs.LG·June 17, 2020

FANOK: Knockoffs in Linear Time

Armin Askari, Quentin Rebjock, Alexandre d'Aspremont, Laurent El, Ghaoui

PDF

Open Access 1 Repo

TL;DR

This paper introduces efficient algorithms for Gaussian model-X knockoffs that enable large-scale feature selection by reducing computational complexity, including methods for covariance estimation and sampling with linear time complexity.

Contribution

It presents novel algorithms for constructing Gaussian knockoffs with significantly improved computational efficiency, suitable for very high-dimensional data.

Findings

01

Algorithms scale to $p=500,000$ features.

02

Complexity reduced from $O(p^3)$ to $O(pk^2)$ with factor models.

03

Efficient covariance estimation and sampling methods developed.

Abstract

We describe a series of algorithms that efficiently implement Gaussian model-X knockoffs to control the false discovery rate on large scale feature selection problems. Identifying the knockoff distribution requires solving a large scale semidefinite program for which we derive several efficient methods. One handles generic covariance matrices, has a complexity scaling as $O (p^{3})$ where $p$ is the ambient dimension, while another assumes a rank $k$ factor model on the covariance matrix to reduce this complexity bound to $O (p k^{2})$ . We also derive efficient procedures to both estimate factor models and sample knockoff covariates with complexity linear in the dimension. We test our methods on problems with $p$ as large as $500, 000$ .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

qrebjock/fanok
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Modeling and Causal Inference · Machine Learning and Algorithms · Machine Learning and Data Classification

MethodsFeature Selection