Large Scale Distributed Linear Algebra With Tensor Processing Units

Adam G.M. Lewis; Jackson Beall; Martin Ganahl; Markus Hauru; Shrestha; Basu Mallick; and Guifre Vidal

arXiv:2112.09017·physics.comp-ph·September 14, 2022

Large Scale Distributed Linear Algebra With Tensor Processing Units

Adam G.M. Lewis, Jackson Beall, Martin Ganahl, Markus Hauru, Shrestha, Basu Mallick, and Guifre Vidal

PDF

TL;DR

This paper demonstrates how Google TPUs, originally designed for machine learning, can be repurposed as large-scale dense linear algebra supercomputers, achieving high performance and scalability for matrix operations.

Contribution

The paper introduces a novel use of TPUs for large-scale linear algebra, leveraging their hardware features for efficient distributed matrix computations.

Findings

01

A 2048-core TPU pod can multiply 1 million by 1 million matrices in about 2 minutes.

02

Distributed algorithms enable scalable QR decomposition, linear system resolution, and matrix function computation.

03

TPUs' high-bandwidth memory and interconnects facilitate linear algebra tasks at unprecedented scale.

Abstract

We have repurposed Google Tensor Processing Units (TPUs), application-specific chips developed for machine learning, into large-scale dense linear algebra supercomputers. The TPUs' fast inter-core interconnects (ICI)s, physically two-dimensional network topology, and high-bandwidth memory (HBM) permit distributed matrix multiplication algorithms to rapidly become computationally bound. In this regime, the matrix-multiply units (MXU)s dominate the runtime, yielding impressive scaling, performance, and raw size: operating in float32 precision, a full 2048-core pod of third generation TPUs can multiply two matrices with linear size $N = 220 = 1048576$ in about 2 minutes. Via curated algorithms emphasizing large, single-core matrix multiplications, other tasks in dense linear algebra can similarly scale. As examples, we present (i) QR decomposition; (ii) resolution of linear systems; and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.