Continuous Matrix Approximation on Distributed Data
Mina Ghashami, Jeff M. Phillips, Feifei Li

TL;DR
This paper introduces new algorithms for efficiently approximating data matrices in distributed streaming environments, enabling accurate norm tracking with minimal communication overhead.
Contribution
It presents novel deterministic algorithms for distributed matrix approximation that maintain small matrices and achieve low communication costs in streaming settings.
Findings
Algorithms achieve small communication complexity O((m/eps) log(beta N)).
Methods provide accurate norm approximations with provable guarantees.
Experimental results demonstrate practical efficiency on large datasets.
Abstract
Tracking and approximating data matrices in streaming fashion is a fundamental challenge. The problem requires more care and attention when data comes from multiple distributed sites, each receiving a stream of data. This paper considers the problem of "tracking approximations to a matrix" in the distributed streaming model. In this model, there are m distributed sites each observing a distinct stream of data (where each element is a row of a distributed matrix) and has a communication channel with a coordinator, and the goal is to track an eps-approximation to the norm of the matrix along any direction. To that end, we present novel algorithms to address the matrix approximation problem. Our algorithms maintain a smaller matrix B, as an approximation to a distributed streaming matrix A, such that for any unit vector x: | ||A x||^2 - ||B x||^2 | <= eps ||A||_F^2. Our algorithms work in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Stochastic Gradient Optimization Techniques · Optimization and Search Problems
