Accelerating Transposed Convolutions on FPGA-based Edge Devices

Jude Haris; Jos\'e Cano

arXiv:2507.07683·cs.AR·July 11, 2025

Accelerating Transposed Convolutions on FPGA-based Edge Devices

Jude Haris, Jos\'e Cano

PDF

Open Access

TL;DR

This paper introduces MM2IM, a hardware-software co-designed accelerator for transposed convolutions on FPGA-based edge devices, significantly improving speed and energy efficiency over existing methods.

Contribution

The paper presents MM2IM, a novel accelerator combining matrix multiplication with col2IM, optimized for resource-constrained edge devices to enhance transposed convolution performance.

Findings

01

Achieves 1.9x average speedup over CPU baseline.

02

Up to 4.2x speedup on generative model layers.

03

Up to 3x speedup and 2.4x energy reduction on GAN models.

Abstract

Transposed Convolutions (TCONV) enable the up-scaling mechanism within generative Artificial Intelligence (AI) models. However, the predominant Input-Oriented Mapping (IOM) method for implementing TCONV has complex output mapping, overlapping sums, and ineffectual computations. These inefficiencies further exacerbate the performance bottleneck of TCONV and generative models on resource-constrained edge devices. To address this problem, in this paper we propose MM2IM, a hardware-software co-designed accelerator that combines Matrix Multiplication (MatMul) with col2IM to process TCONV layers on resource-constrained edge devices efficiently. Using the SECDA-TFLite design toolkit, we implement MM2IM and evaluate its performance across 261 TCONV problem configurations, achieving an average speedup of 1.9x against a dual-thread ARM Neon optimized CPU baseline. We then evaluate the performance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Embedded Systems Design Techniques · Numerical Methods and Algorithms