# Stochastic Subsampling for Factorizing Huge Matrices

**Authors:** Arthur Mensch (PARIETAL, NEUROSPIN), Julien Mairal (Thoth), Bertrand, Thirion (PARIETAL, NEUROSPIN), Gael Varoquaux (NEUROSPIN, PARIETAL)

arXiv: 1701.05363 · 2017-11-15

## TL;DR

This paper introduces a scalable matrix-factorization algorithm that efficiently handles massive matrices by streaming and subsampling, with proven convergence and demonstrated success on large real-world datasets.

## Contribution

The proposed method combines streaming and subsampling techniques for scalable matrix factorization with convergence guarantees, suitable for large-scale data and various factor types.

## Key findings

- Achieves significant speed-ups over state-of-the-art algorithms.
- Successfully applied to 2 TB MRI data and 103 GB hyperspectral images.
- Provides convergence guarantees to a stationary point.

## Abstract

We present a matrix-factorization algorithm that scales to input matrices with both huge number of rows and columns. Learned factors may be sparse or dense and/or non-negative, which makes our algorithm suitable for dictionary learning, sparse component analysis, and non-negative matrix factorization. Our algorithm streams matrix columns while subsampling them to iteratively learn the matrix factors. At each iteration, the row dimension of a new sample is reduced by subsampling, resulting in lower time complexity compared to a simple streaming algorithm. Our method comes with convergence guarantees to reach a stationary point of the matrix-factorization problem. We demonstrate its efficiency on massive functional Magnetic Resonance Imaging data (2 TB), and on patches extracted from hyperspectral images (103 GB). For both problems, which involve different penalties on rows and columns, we obtain significant speed-ups compared to state-of-the-art algorithms.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1701.05363/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/1701.05363/full.md

## References

53 references — full list in the complete paper: https://tomesphere.com/paper/1701.05363/full.md

---
Source: https://tomesphere.com/paper/1701.05363