numpywren: serverless linear algebra
Vaishaal Shankar, Karl Krauth, Qifan Pu, Eric Jonas, Shivaram, Venkataraman, Ion Stoica, Benjamin Recht, Jonathan Ragan-Kelley

TL;DR
numpywren demonstrates that serverless architectures can efficiently perform large-scale linear algebra operations with elastic scalability and ease of management, achieving performance close to traditional supercomputing solutions for certain algorithms.
Contribution
The paper introduces numpywren, a serverless system for linear algebra, and LAmbdaPACK, a language for parallel algorithms, enabling scalable and fault-tolerant computations.
Findings
Performance within 33% of ScaLAPACK for key algorithms
Up to 240% better CPU-hour efficiency due to elasticity
Limitations in network efficiency affect some algorithms
Abstract
Linear algebra operations are widely used in scientific computing and machine learning applications. However, it is challenging for scientists and data analysts to run linear algebra at scales beyond a single machine. Traditional approaches either require access to supercomputing clusters, or impose configuration and cluster management challenges. In this paper we show how the disaggregation of storage and compute resources in so-called "serverless" environments, combined with compute-intensive workload characteristics, can be exploited to achieve elastic scalability and ease of management. We present numpywren, a system for linear algebra built on a serverless architecture. We also introduce LAmbdaPACK, a domain-specific language designed to implement highly parallel linear algebra algorithms in a serverless setting. We show that, for certain linear algebra algorithms such as matrix…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Distributed and Parallel Computing Systems · Parallel Computing and Optimization Techniques
