numpywren: serverless linear algebra

Vaishaal Shankar; Karl Krauth; Qifan Pu; Eric Jonas; Shivaram; Venkataraman; Ion Stoica; Benjamin Recht; Jonathan Ragan-Kelley

arXiv:1810.09679·cs.DC·October 24, 2018·67 cites

numpywren: serverless linear algebra

Vaishaal Shankar, Karl Krauth, Qifan Pu, Eric Jonas, Shivaram, Venkataraman, Ion Stoica, Benjamin Recht, Jonathan Ragan-Kelley

PDF

Open Access

TL;DR

numpywren demonstrates that serverless architectures can efficiently perform large-scale linear algebra operations with elastic scalability and ease of management, achieving performance close to traditional supercomputing solutions for certain algorithms.

Contribution

The paper introduces numpywren, a serverless system for linear algebra, and LAmbdaPACK, a language for parallel algorithms, enabling scalable and fault-tolerant computations.

Findings

01

Performance within 33% of ScaLAPACK for key algorithms

02

Up to 240% better CPU-hour efficiency due to elasticity

03

Limitations in network efficiency affect some algorithms

Abstract

Linear algebra operations are widely used in scientific computing and machine learning applications. However, it is challenging for scientists and data analysts to run linear algebra at scales beyond a single machine. Traditional approaches either require access to supercomputing clusters, or impose configuration and cluster management challenges. In this paper we show how the disaggregation of storage and compute resources in so-called "serverless" environments, combined with compute-intensive workload characteristics, can be exploited to achieve elastic scalability and ease of management. We present numpywren, a system for linear algebra built on a serverless architecture. We also introduce LAmbdaPACK, a domain-specific language designed to implement highly parallel linear algebra algorithms in a serverless setting. We show that, for certain linear algebra algorithms such as matrix…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCloud Computing and Resource Management · Distributed and Parallel Computing Systems · Parallel Computing and Optimization Techniques