Accelerating Transposed Convolutions on FPGA-based Edge Devices
Jude Haris, Jos\'e Cano

TL;DR
This paper introduces MM2IM, a hardware-software co-designed accelerator for transposed convolutions on FPGA-based edge devices, significantly improving speed and energy efficiency over existing methods.
Contribution
The paper presents MM2IM, a novel accelerator combining matrix multiplication with col2IM, optimized for resource-constrained edge devices to enhance transposed convolution performance.
Findings
Achieves 1.9x average speedup over CPU baseline.
Up to 4.2x speedup on generative model layers.
Up to 3x speedup and 2.4x energy reduction on GAN models.
Abstract
Transposed Convolutions (TCONV) enable the up-scaling mechanism within generative Artificial Intelligence (AI) models. However, the predominant Input-Oriented Mapping (IOM) method for implementing TCONV has complex output mapping, overlapping sums, and ineffectual computations. These inefficiencies further exacerbate the performance bottleneck of TCONV and generative models on resource-constrained edge devices. To address this problem, in this paper we propose MM2IM, a hardware-software co-designed accelerator that combines Matrix Multiplication (MatMul) with col2IM to process TCONV layers on resource-constrained edge devices efficiently. Using the SECDA-TFLite design toolkit, we implement MM2IM and evaluate its performance across 261 TCONV problem configurations, achieving an average speedup of 1.9x against a dual-thread ARM Neon optimized CPU baseline. We then evaluate the performance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Embedded Systems Design Techniques · Numerical Methods and Algorithms
