ShortcutFusion: From Tensorflow to FPGA-based accelerator with   reuse-aware memory allocation for shortcut data

Duy Thanh Nguyen; Hyeonseung Je; Tuan Nghia Nguyen; Soojung Ryu,; Kyujoong Lee; and Hyuk-Jae Lee

arXiv:2106.08167·cs.DC·March 8, 2022

ShortcutFusion: From Tensorflow to FPGA-based accelerator with reuse-aware memory allocation for shortcut data

Duy Thanh Nguyen, Hyeonseung Je, Tuan Nghia Nguyen, Soojung Ryu,, Kyujoong Lee, and Hyuk-Jae Lee

PDF

TL;DR

ShortcutFusion is an FPGA-based optimization tool that enhances data reuse for shortcut data in CNNs, significantly reducing off-chip memory access and improving speed and power efficiency.

Contribution

It introduces a reuse-aware static memory allocation method for shortcut data, optimizing FPGA accelerator performance for residual networks.

Findings

01

2.8x faster than NVIDIA RTX 2080 Ti on FPGA

02

47.8-84.8% reduction in DRAM access for multiple models

03

Reduces off-chip feature-map access 5.27x compared to baseline

Abstract

Residual block is a very common component in recent state-of-the art CNNs such as EfficientNet or EfficientDet. Shortcut data accounts for nearly 40% of feature-maps access in ResNet152 [8]. Most of the previous DNN compilers, accelerators ignore the shortcut data optimization. This paper presents ShortcutFusion, an optimization tool for FPGA-based accelerator with a reuse-aware static memory allocation for shortcut data, to maximize on-chip data reuse given resource constraints. From TensorFlow DNN models, the proposed design generates instruction sets for a group of nodes which uses an optimized data reuse for each residual block. The accelerator design implemented on the Xilinx KCU1500 FPGA card 2.8x faster and 9.9x more power efficient than NVIDIA RTX 2080 Ti for 256x256 input size. . Compared to the result from baseline, in which the weights, inputs, and outputs are accessed from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.