Kernel-Segregated Transpose Convolution Operation
Vijay Srinivas Tida, Sai Venkatesh Chilukoti, Xiali Hei, Sonya Hsu

TL;DR
This paper introduces a kernel segregation technique to optimize transpose convolution, significantly reducing computation time and resource usage in deep learning applications without extra hardware.
Contribution
The paper presents a novel kernel segregation algorithm that accelerates transpose convolution by reducing unnecessary computations and memory use, applicable across hardware platforms.
Findings
3.09x faster on GPU with flower dataset
2.2x faster training on CPU with MNIST
Reduces memory and computation in transpose convolution
Abstract
Transpose convolution has shown prominence in many deep learning applications. However, transpose convolution layers are computationally intensive due to the increased feature map size due to adding zeros after each element in each row and column. Thus, convolution operation on the expanded input feature map leads to poor utilization of hardware resources. The main reason for unnecessary multiplication operations is zeros at predefined positions in the input feature map. We propose an algorithmic-level optimization technique for the effective transpose convolution implementation to solve these problems. Based on kernel activations, we segregated the original kernel into four sub-kernels. This scheme could reduce memory requirements and unnecessary multiplications. Our proposed method was faster computation using the Titan X GPU (Intel Dual Core CPU) with a flower…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Stochastic Gradient Optimization Techniques
MethodsConvolution
