Unified Kernel-Segregated Transpose Convolution Operation

Vijay Srinivas Tida; Md Imran Hossen; Liqun Shan; Sai Venkatesh; Chilukoti; Sonya Hsu; Xiali Hei

arXiv:2502.20493·cs.LG·March 3, 2025

Unified Kernel-Segregated Transpose Convolution Operation

Vijay Srinivas Tida, Md Imran Hossen, Liqun Shan, Sai Venkatesh, Chilukoti, Sonya Hsu, Xiali Hei

PDF

TL;DR

This paper introduces a unified kernel segregation method for transpose convolution that reduces memory and computational costs, achieving over 2x speedup and significant memory savings in deep learning models.

Contribution

The paper proposes a novel unified kernel segregation approach that optimizes transpose convolution operations, improving speed and memory efficiency compared to existing methods.

Findings

01

Achieves 2.03x to 3.89x speedup on GPUs and CPUs.

02

Reduces memory usage by up to 35 MB in GAN models.

03

Provides significant acceleration in transpose convolution layers.

Abstract

The optimization of the transpose convolution layer for deep learning applications is achieved with the kernel segregation mechanism. However, kernel segregation has disadvantages, such as computing extra elements to obtain the output feature map with odd dimensions while launching a thread. To mitigate this problem, we introduce a unified kernel segregation approach that limits the usage of memory and computational resources by employing one unified kernel to execute four sub-kernels. The findings reveal that the suggested approach achieves an average computational speedup of 2.03x (3.89x) when tested on specific datasets with an RTX 2070 GPU (Intel Xeon CPU). The ablation study shows an average computational speedup of 3.5x when evaluating the transpose convolution layers from well-known Generative Adversarial Networks (GANs). The implementation of the proposed method for the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.