Memory-Efficient Dataflow Inference for Deep CNNs on FPGA

Lucian Petrica; Tobias Alonso; Mairin Kroes; Nicholas Fraser; Sorin; Cotofana; Michaela Blott

arXiv:2011.07317·cs.AR·November 17, 2020

Memory-Efficient Dataflow Inference for Deep CNNs on FPGA

Lucian Petrica, Tobias Alonso, Mairin Kroes, Nicholas Fraser, Sorin, Cotofana, Michaela Blott

PDF

TL;DR

This paper introduces FCMP, a methodology to improve FPGA dataflow CNN accelerators by optimizing on-chip memory utilization, enabling cost-effective and flexible deployment of complex CNNs like ResNet-50 with minimal throughput loss.

Contribution

The paper presents FCMP, a novel memory packing technique that enhances OCM efficiency in FPGA CNN accelerators without hardware modifications, facilitating better porting and cost reduction.

Findings

01

Achieved up to 30% reduction in OCM utilization for CIFAR-10 accelerators.

02

Enabled porting of ResNet-50 accelerator from U250 to U280 with minimal throughput loss.

03

Demonstrated improved flexibility and cost-effectiveness in FPGA CNN inference designs.

Abstract

Custom dataflow Convolutional Neural Network (CNN) inference accelerators on FPGA are tailored to a specific CNN topology and store parameters in On-Chip Memory (OCM), resulting in high energy efficiency and low inference latency. However, in these accelerators the shapes of parameter memories are dictated by throughput constraints and do not map well to the underlying OCM, which becomes an implementation bottleneck. In this work, we propose an accelerator design methodology - Frequency Compensated Memory Packing (FCMP) - which improves the OCM utilization efficiency of dataflow accelerators with minimal reduction in throughput and no modifications to the physical structure of FPGA OCM. To validate our methodology, we apply it to several realizations of medium-sized CIFAR-10 inference accelerators and demonstrate up to 30% reduction in OCM utilization without loss of inference…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.