Consistent Estimation for PCA and Sparse Regression with Oblivious   Outliers

Tommaso d'Orsi; Chih-Hung Liu; Rajai Nasser; Gleb Novikov; David; Steurer; Stefan Tiegel

arXiv:2111.02966·cs.LG·November 5, 2021

Consistent Estimation for PCA and Sparse Regression with Oblivious Outliers

Tommaso d'Orsi, Chih-Hung Liu, Rajai Nasser, Gleb Novikov, David, Steurer, Stefan Tiegel

PDF

1 Video

TL;DR

This paper introduces new efficient and consistent estimators for PCA and sparse regression that are robust to a significant fraction of corrupted responses, achieving near-zero error as data size increases.

Contribution

It develops a general machinery for designing estimators that are both computationally feasible and consistent under oblivious adversarial corruption, with specific advances for PCA and sparse regression.

Findings

01

Achieves consistency for sparse regression with optimal sample size and error rate.

02

Attains optimal error guarantees for PCA under broad assumptions.

03

Extends analysis of loss functions with non-smooth regularizers to robust estimation.

Abstract

We develop machinery to design efficiently computable and consistent estimators, achieving estimation error approaching zero as the number of observations grows, when facing an oblivious adversary that may corrupt responses in all but an $α$ fraction of the samples. As concrete examples, we investigate two problems: sparse regression and principal component analysis (PCA). For sparse regression, we achieve consistency for optimal sample size $n ≳ (k lo g d) / α^{2}$ and optimal error rate $O ((k lo g d) / (n \cdot α^{2}))$ where $n$ is the number of observations, $d$ is the number of dimensions and $k$ is the sparsity of the parameter vector, allowing the fraction of inliers to be inverse-polynomial in the number of samples. Prior to this work, no estimator was known to be consistent when the fraction of inliers $α$ is $o (1/ lo g lo g n)$ , even for (non-spherical)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Consistent Estimation for PCA and Sparse Regression with Oblivious Outliers· slideslive

Taxonomy

MethodsHuber loss · Principal Components Analysis