FPGA-based AI Smart NICs for Scalable Distributed AI Training Systems

Rui Ma; Evangelos Georganas; Alexander Heinecke; Andrew Boutros; Eriko; Nurvitadhi

arXiv:2204.10943·cs.DC·April 26, 2022

FPGA-based AI Smart NICs for Scalable Distributed AI Training Systems

Rui Ma, Evangelos Georganas, Alexander Heinecke, Andrew Boutros, Eriko, Nurvitadhi

PDF

Open Access

TL;DR

This paper introduces FPGA-based AI smart NICs that accelerate collective communication, notably all-reduce, in distributed AI training, significantly improving performance and scalability of multi-node systems.

Contribution

The paper presents a novel FPGA-based smart NIC design that accelerates all-reduce operations and optimizes bandwidth, enabling scalable and efficient distributed AI training.

Findings

01

Achieved 1.6x performance improvement on 6 nodes

02

Validated an analytical model for larger system scaling

03

Estimated 2.5x performance gain at 32 nodes

Abstract

Rapid advances in artificial intelligence (AI) technology have led to significant accuracy improvements in a myriad of application domains at the cost of larger and more compute-intensive models. Training such models on massive amounts of data typically requires scaling to many compute nodes and relies heavily on collective communication algorithms, such as all-reduce, to exchange the weight gradients between different nodes. The overhead of these collective communication operations in a distributed AI training system can bottleneck its performance, with more pronounced effects as the number of nodes increases. In this paper, we first characterize the all-reduce operation overhead by profiling distributed AI training. Then, we propose a new smart network interface card (NIC) for distributed AI training systems using field-programmable gate arrays (FPGAs) to accelerate all-reduce…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Memory and Neural Computing · Advanced Neural Network Applications · Machine Learning and ELM