Subsampled online matrix factorization with convergence guarantees
Arthur Mensch (PARIETAL), Julien Mairal (LEAR), Ga\"el Varoquaux, (PARIETAL), Bertrand Thirion (PARIETAL)

TL;DR
This paper introduces a scalable online matrix factorization algorithm that uses subsampling and low-dimensional statistics to efficiently handle large matrices, with proven convergence guarantees and improved speed over previous methods.
Contribution
The proposed method is the first to combine subsampling with convergence guarantees in online matrix factorization for large-scale data.
Findings
Achieves significant speed-ups compared to non-subsampling methods.
Handles matrices larger than 1TB with low memory footprint.
Guarantees convergence to a stationary point.
Abstract
We present a matrix factorization algorithm that scales to input matrices that are large in both dimensions (i.e., that contains morethan 1TB of data). The algorithm streams the matrix columns while subsampling them, resulting in low complexity per iteration andreasonable memory footprint. In contrast to previous online matrix factorization methods, our approach relies on low-dimensional statistics from past iterates to control the extra variance introduced by subsampling. We present a convergence analysis that guarantees us to reach a stationary point of the problem. Large speed-ups can be obtained compared to previous online algorithms that do not perform subsampling, thanks to the feature redundancy that often exists in high-dimensional settings.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Tensor decomposition and applications
