A Survey of Singular Value Decomposition Methods for Distributed   Tall/Skinny Data

Drew Schmidt

arXiv:2009.00761·cs.MS·September 3, 2020

A Survey of Singular Value Decomposition Methods for Distributed Tall/Skinny Data

Drew Schmidt

PDF

TL;DR

This survey reviews three algorithms for computing the SVD of large, tall/skinny matrices in distributed environments, highlighting their performance on modern supercomputers and discussing future directions.

Contribution

It provides a comprehensive comparison of distributed SVD algorithms for tall/skinny data, including performance results on CPU and GPU architectures.

Findings

01

Algorithms perform efficiently on Summit supercomputer

02

GPU implementations show significant speedups

03

Discussion of alternative approaches for large-scale SVD

Abstract

The Singular Value Decomposition (SVD) is one of the most important matrix factorizations, enjoying a wide variety of applications across numerous application domains. In statistics and data analysis, the common applications of SVD such as Principal Components Analysis (PCA) and linear regression. Usually these applications arise on data that has far more rows than columns, so-called "tall/skinny" matrices. In the big data analytics context, this may take the form of hundreds of millions to billions of rows with only a few hundred columns. There is a need, therefore, for fast, accurate, and scalable tall/skinny SVD implementations which can fully utilize modern computing resources. To that end, we present a survey of three different algorithms for computing the SVD for these kinds of tall/skinny data layouts using MPI for communication. We contextualize these with common big data…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.