Revealing Untapped DSP Optimization Potentials for FPGA-Based Systolic Matrix Engines
Jindong Li, Tenglong Li, Guobin Shen, Dongcheng Zhao, Qian Zhang and, Yi Zeng

TL;DR
This paper identifies new optimization techniques for DSP48E2 blocks in FPGA-based systolic matrix engines, significantly improving performance and resource efficiency in neural network accelerators and neuromorphic hardware.
Contribution
It introduces novel DSP optimization methods for FPGA systolic architectures, demonstrating substantial resource and power savings in real-world neural network implementations.
Findings
Achieved significant resource reduction in FPGA-based neural network accelerators.
Demonstrated power efficiency improvements with the proposed techniques.
Validated applicability to neuromorphic hardware for spiking neural networks.
Abstract
Systolic architectures are widely embraced by neural network accelerators for their superior performance in highly parallelized computation. The DSP48E2s serve as dedicated arithmetic blocks in Xilinx Ultrascale series FPGAs and constitute a fundamental component in FPGA-based systolic matrix engines. Harnessing the full potential of DSP48E2s in architectural design can result in significant performance enhancements for systolic architectures on Ultrascale series FPGAs. This paper unveils several previously untapped DSP optimization techniques capable of further enhancing FPGA-based systolic matrix engines. We apply these techniques to two well-known systolic architectures: Google TPUv1 and Xilinx Vitis AI DPU. With the proposed techniques, our design achieves substantial resource and power reduction compared to the open-source TPUv1 FPGA implementation and the Vitis AI DPU…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmbedded Systems Design Techniques · VLSI and FPGA Design Techniques · Parallel Computing and Optimization Techniques
