Optimization of FPGA-based CNN Accelerators Using Metaheuristics
Sadiq M. Sait, Aiman El-Maleh, Mohammad Altakrouri, and Ahmad Shawahna

TL;DR
This paper introduces an automated FPGA design methodology using metaheuristics to optimize CNN accelerators, significantly improving throughput over existing approaches on various FPGA platforms.
Contribution
It presents a novel metaheuristic-based framework for resource partitioning in FPGA CNN accelerators, enhancing performance and efficiency.
Findings
Achieves 1.31x to 2.37x higher throughput than state-of-the-art methods.
Effectively optimizes FPGA resource utilization for multiple CNN architectures.
Demonstrates promising results on Xilinx FPGA boards.
Abstract
In recent years, convolutional neural networks (CNNs) have demonstrated their ability to solve problems in many fields and with accuracy that was not possible before. However, this comes with extensive computational requirements, which made general CPUs unable to deliver the desired real-time performance. At the same time, FPGAs have seen a surge in interest for accelerating CNN inference. This is due to their ability to create custom designs with different levels of parallelism. Furthermore, FPGAs provide better performance per watt compared to GPUs. The current trend in FPGA-based CNN accelerators is to implement multiple convolutional layer processors (CLPs), each of which is tailored for a subset of layers. However, the growing complexity of CNN architectures makes optimizing the resources available on the target FPGA device to deliver optimal performance more challenging. In this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsFire Module · Inception Module · 1x1 Convolution · *Communicated@Fast*How Do I Communicate to Expedia? · Xavier Initialization · Softmax · Local Response Normalization · Dropout · Convolution · Global Average Pooling
