Fusing Depthwise and Pointwise Convolutions for Efficient Inference on GPUs
Fareed Qararyah, Muhammad Waqar Azhar, Mohammad Ali Maleki, Pedro, Trancoso

TL;DR
This paper introduces Fused Convolutional Modules (FCMs) and FusePlanner to optimize GPU execution of depthwise and pointwise convolutions, significantly reducing memory accesses and improving speed and energy efficiency in CNNs and ViTs.
Contribution
The paper presents novel GPU kernels for fusing depthwise and pointwise convolutions, along with a cost model-based planner for optimal fusion, addressing prior limitations in GPU-based convolution fusion.
Findings
FCMs reduce memory accesses by up to 83%.
FCMs achieve up to 3.7x speedup over cuDNN.
Complete models outperform TVMs with up to 1.8x speedup and two-thirds energy savings.
Abstract
Depthwise and pointwise convolutions have fewer parameters and perform fewer operations than standard convolutions. As a result, they have become increasingly used in various compact DNNs, including convolutional neural networks (CNNs) and vision transformers (ViTs). However, they have a lower compute-to-memory-access ratio than standard convolutions, making their memory accesses often the performance bottleneck. This paper explores fusing depthwise and pointwise convolutions to overcome the memory access bottleneck. The focus is on fusing these operators on GPUs. The prior art on GPU-based fusion suffers from one or more of the following: (1) fusing either a convolution with an element-wise or multiple non-convolutional operators, (2) not explicitly optimizing for memory accesses, (3) not supporting depthwise convolutions. This paper proposes Fused Convolutional Modules (FCMs), a set…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Image and Video Retrieval Techniques · 3D Shape Modeling and Analysis
