On Subspace Approximation and Subset Selection in Fewer Passes by MCMC Sampling
Amit Deshpande, Rameshwar Pratap

TL;DR
This paper introduces an MCMC-based method for subset selection in $\, ext{ell}_p$ subspace approximation that significantly reduces the number of data passes needed, achieving near-optimal subsets efficiently.
Contribution
The authors develop an MCMC algorithm that minimizes data passes for subset selection in $\, ext{ell}_p$ subspace approximation, improving over previous adaptive sampling methods.
Findings
For $p=2$, subset selection with nearly optimal size in 2 passes.
Algorithm achieves $(1+\epsilon)$ approximation with polynomial subset size.
Extends to datasets with outliers and reduces passes for $p\geq 2$.
Abstract
We consider the problem of subset selection for subspace approximation, i.e., given points in dimensions, we need to pick a small, representative subset of the given points such that its span gives approximation to the best -dimensional subspace that minimizes the sum of -th powers of distances of all the points to this subspace. Sampling-based subset selection techniques require adaptive sampling iterations with multiple passes over the data. Matrix sketching techniques give a single-pass approximation for subspace approximation but require additional passes for subset selection. In this work, we propose an MCMC algorithm to reduce the number of passes required by previous subset selection algorithms based on adaptive sampling. For , our algorithm gives subset selection of nearly optimal size in only passes,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Machine Learning and Algorithms · Complexity and Algorithms in Graphs
