A Data-Center FPGA Acceleration Platform for Convolutional Neural Networks
Xiaoyu Yu, Yuwei Wang, Jie Miao, Ephrem Wu, Heng Zhang, Yu Meng, Bo, Zhang, Biao Min, Dewei Chen, Jianlin Gao

TL;DR
This paper presents a versatile FPGA-based acceleration platform for CNN inference in data centers, achieving high performance and low latency comparable to GPUs while maintaining cost efficiency.
Contribution
The paper introduces a unified FPGA framework with 4,096 DSPs and novel task dispatching and buffering methods for efficient CNN inference acceleration in data centers.
Findings
Achieves up to 4.2 TOP/s 16-bit fixed-point performance.
Matches GPU throughput with over 50x lower latency.
Demonstrates superior FPGA peak performance.
Abstract
Intensive computation is entering data centers with multiple workloads of deep learning. To balance the compute efficiency, performance, and total cost of ownership (TCO), the use of a field-programmable gate array (FPGA) with reconfigurable logic provides an acceptable acceleration capacity and is compatible with diverse computation-sensitive tasks in the cloud. In this paper, we develop an FPGA acceleration platform that leverages a unified framework architecture for general-purpose convolutional neural network (CNN) inference acceleration at a data center. To overcome the computation bound, 4,096 DSPs are assembled and shaped as supertile units (SUs) for different types of convolution, which provide up to 4.2 TOP/s 16-bit fixed-point performance at 500 MHz. The interleaved-task-dispatching method is proposed to map the computation across the SUs, and the memory bound is solved by a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
