Streaming and Distributed Algorithms for Robust Column Subset Selection
Shuli Jiang, Dongyu Li, Irene Mengze Li, Arvind V. Mahankali, David P., Woodruff

TL;DR
This paper introduces the first single-pass streaming and distributed algorithms for robust column subset selection using the entrywise p-norm, achieving near-optimal space and communication complexity with strong approximation guarantees.
Contribution
It presents novel streaming and distributed algorithms for p-norm column subset selection, leveraging new reductions and coreset constructions for robustness and efficiency.
Findings
Achieves p-norm approximation with optimal space complexity.
Extends to a 1-round distributed protocol with low communication cost.
Provides practical algorithms with significant real-world data analysis benefits.
Abstract
We give the first single-pass streaming algorithm for Column Subset Selection with respect to the entrywise -norm with . We study the norm loss since it is often considered more robust to noise than the standard Frobenius norm. Given an input matrix (), our algorithm achieves a multiplicative -approximation to the error with respect to the best possible column subset of size . Furthermore, the space complexity of the streaming algorithm is optimal up to a logarithmic factor. Our streaming algorithm also extends naturally to a 1-round distributed protocol with nearly optimal communication cost. A key ingredient in our algorithms is a reduction to column subset selection in the -norm, which corresponds to the -norm of the vector of Euclidean norms of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques · Blind Source Separation Techniques
