Experience with PCIe streaming on FPGA for high throughput ML inferencing
Piyush Manavar, Manoj Nambiar, Nupur Sumeet, Rekha Singhal, Sharod, Choudhary, Amey Pandit

TL;DR
This paper demonstrates that PCIe streaming on FPGA platforms significantly enhances high-throughput ML inference performance and energy efficiency, outperforming CPU and GPU implementations in gradient boosted trees applications.
Contribution
The paper introduces a PCIe streaming approach on FPGA for ML inference, achieving superior throughput and energy efficiency compared to GPU and CPU baselines.
Findings
FPGA with PCIe streaming outperforms GPU and CPU in throughput.
Energy efficiency is 25x better than CPU and 12x better than GPU.
Identifies conditions for optimal FPGA acceleration.
Abstract
Achieving maximum possible rate of inferencing with minimum hardware resources plays a major role in reducing enterprise operational costs. In this paper we explore use of PCIe streaming on FPGA based platforms to achieve high throughput. PCIe streaming is a unique capability available on FPGA that eliminates the need for memory copy overheads. We have presented our results for inferences on a gradient boosted trees model, for online retail recommendations. We compare the results achieved with the popular library implementations on GPU and the CPU platforms and observe that the PCIe streaming enabled FPGA implementation achieves the best overall measured performance. We also measure power consumption across all platforms and find that the PCIe streaming on FPGA platform achieves the 25x and 12x better energy efficiency than an implementation on CPU and GPU platforms, respectively. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Parallel Computing and Optimization Techniques · Distributed and Parallel Computing Systems
