The Feasibility of Implementing Large-Scale Transformers on Multi-FPGA Platforms
Yu Gao, Juan Camilo Vega, Paul Chow

TL;DR
This paper investigates the potential of using multiple FPGAs to implement large transformer models, developing a scalable platform and tools, and demonstrating feasibility with a multi-FPGA I-BERT prototype.
Contribution
It introduces a scalable multi-FPGA platform and tools for large ML applications, and validates the approach with a multi-FPGA I-BERT implementation.
Findings
Multi-FPGA implementation of I-BERT is feasible.
FPGAs can be competitive with GPUs for large ML models.
The platform shows promising performance potential.
Abstract
FPGAs are rarely mentioned when discussing the implementation of large machine learning applications, such as Large Language Models (LLMs), in the data center. There has been much evidence showing that single FPGAs can be competitive with GPUs in performance for some computations, especially for low latency, and often much more efficient when power is considered. This suggests that there is merit to exploring the use of multiple FPGAs for large machine learning applications. The challenge with using multiple FPGAs is that there is no commonly-accepted flow for developing and deploying multi-FPGA applications, i.e., there are no tools to describe a large application, map it to multiple FPGAs and then deploy the application on a multi-FPGA platform. In this paper, we explore the feasibility of implementing large transformers using multiple FPGAs by developing a scalable multi-FPGA…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPower Transformer Diagnostics and Insulation · Neural Networks and Applications · Electromagnetic Compatibility and Noise Suppression
MethodsI-BERT
