High-Dimensional Geometric Streaming for Nearly Low Rank Data
Hossein Esfandiari, Vahab Mirrokni, Praneeth Kacham, David P., Woodruff, Peilin Zhong

TL;DR
This paper introduces efficient streaming algorithms with strong coresets for high-dimensional $\, ext{l}_p$ subspace approximation, enabling near-optimal solutions for low-rank data in large datasets.
Contribution
It presents a deterministic coreset construction for $\, ext{l}_ ext{infinity}$ subspace approximation and extends it to general $\, ext{l}_p$ cases, improving streaming algorithms for high-dimensional geometric problems.
Findings
Deterministic coreset for $\, ext{l}_ ext{infinity}$ subspace approximation with near-tight distortion.
Poly$(k, \, ext{log}\, n)$ approximation algorithms for $\, ext{l}_p$ subspace approximation.
Enhanced streaming algorithms for geometric problems like width, convex hull, and volume estimation.
Abstract
We study streaming algorithms for the subspace approximation problem. Given points as an insertion-only stream and a rank parameter , the subspace approximation problem is to find a -dimensional subspace such that is minimized, where denotes the Euclidean distance between and defined as . When , we need to find a subspace that minimizes . For subspace approximation, we give a deterministic strong coreset construction algorithm and show that it can be used to compute a approximate solution. We show that the distortion obtained by our coreset is nearly tight for any sublinear space algorithm. For subspace approximation, we show that suitably scaling the points and then…
Peer Reviews
Decision·ICML 2024 Poster
The authors present a rather simple streaming algorithm for the $\ell_p$-subspace approximation problem and show its practical value via experiments.
The authors do not present the state-of-the-art for $\ell_\infty$-subspace approximation. For instance, when k is 0 or 1, there are O(1)-approximations (see for instance, Chan and Pathak CGTA 2014, Agarwal and Sharathkumar, (SODA 2010, Algorithmica, 2015) and some other followup work. Although for restricted k, they achieve significantly better approximation ratios. Does your algorithm achieve similar ratios when k is small? These algorithms are equally simple: Does your algorithm have a better
Originality: First paper to provide streaming algorithms for the outer $(d-k)$-radius estimation problem. Their coreset algorithm is new and as they show has several applications. Would be of future interest to researchers working on other related problems. Quality and Clarity: Paper is very well-written, easy to understand. I have not checked all the technical details in the proofs but they are very well-explained and it is unlikely that there are any major issues. Significance: The paper's k
Not any that I can see right now.
- The study is well-motivated. In particular, the problem is related to clustering and subspace approximation which are fundamental ML/data analysis tasks, and the streaming setting addresses the computational issues of ML in the big data era. - The result can also be applied to improve a recent paper [17] in a certain case, which is a nice application that shows the theoretical relevance of the paper - The paper also provides experiments, which indicate that the seemingly complicated steps ca
- The paper is quite technical and is not easy to understand especially for general audience. In addition, too many results are squeezed into the 9 pages. In my point of view, the author could focus on the main result Theorem 1.1, and this itself should already fit the volume of an ICLR paper (considering the 9 pages of the main text). - I don't see a related work section. Since your main technique is coreset, it might make sense to mention works related to coreset. - In fact, the discussion o
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Image and Video Quality Assessment · Data Management and Algorithms
