$k$-PCA for (non-squared) Euclidean Distances: Polynomial Time Approximation
Daniel Greenhut, Dan Feldman

TL;DR
This paper introduces the first polynomial-time deterministic algorithm for approximating the median subspace in high-dimensional data, achieving a actor approximation with efficient runtime, improving robustness to noise and outliers.
Contribution
It presents a novel polynomial-time algorithm for approximating the median subspace, a problem previously hard to solve efficiently, with a guaranteed approximation factor.
Findings
Algorithm runs in polynomial time
Achieves actor approximation
Effective on real-world datasets
Abstract
Given an integer and a set of points in , the classic -PCA (Principle Component Analysis) approximates the affine \emph{-subspace mean} of , which is the -dimensional affine linear subspace that minimizes its sum of squared Euclidean distances (-norm) over the points of , i.e., the mean of these distances. The \emph{-subspace median} is the subspace that minimizes its sum of (non-squared) Euclidean distances (-mixed norm), i.e., their median. The median subspace is usually more sparse and robust to noise/outliers than the mean, but also much harder to approximate since, unlike the (non-mixed) norms, it is non-convex for . We provide the first polynomial-time deterministic algorithm whose both running time and approximation factor are not exponential in . More precisely, the multiplicative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
