A Parameterizable Convolution Accelerator for Embedded Deep Learning Applications
Panagiotis Mousouliotis, Georgios Keramidas

TL;DR
This paper introduces a parameterizable CNN accelerator for embedded FPGA applications, optimized through HW/SW co-design and high-level synthesis to balance performance, latency, power, and area constraints.
Contribution
It presents a flexible, high-level synthesis-based design methodology for FPGA CNN accelerators that effectively manages multiple embedded application constraints.
Findings
Outperforms non-parameterized designs in efficiency.
Easily extendable to other deep learning applications.
Demonstrates effective optimization across multiple constraints.
Abstract
Convolutional neural network (CNN) accelerators implemented on Field-Programmable Gate Arrays (FPGAs) are typically designed with a primary focus on maximizing performance, often measured in giga-operations per second (GOPS). However, real-life embedded deep learning (DL) applications impose multiple constraints related to latency, power consumption, area, and cost. This work presents a hardware-software (HW/SW) co-design methodology in which a CNN accelerator is described using high-level synthesis (HLS) tools that ease the parameterization of the design, facilitating more effective optimizations across multiple design constraints. Our experimental results demonstrate that the proposed design methodology is able to outperform non-parameterized design approaches, and it can be easily extended to other types of DL applications.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Embedded Systems Design Techniques · Numerical Methods and Algorithms
