Fusing Depthwise and Pointwise Convolutions for Efficient Inference on   GPUs

Fareed Qararyah; Muhammad Waqar Azhar; Mohammad Ali Maleki; Pedro; Trancoso

arXiv:2404.19331·cs.PF·August 6, 2024

Fusing Depthwise and Pointwise Convolutions for Efficient Inference on GPUs

Fareed Qararyah, Muhammad Waqar Azhar, Mohammad Ali Maleki, Pedro, Trancoso

PDF

Open Access 1 Repo

TL;DR

This paper introduces Fused Convolutional Modules (FCMs) and FusePlanner to optimize GPU execution of depthwise and pointwise convolutions, significantly reducing memory accesses and improving speed and energy efficiency in CNNs and ViTs.

Contribution

The paper presents novel GPU kernels for fusing depthwise and pointwise convolutions, along with a cost model-based planner for optimal fusion, addressing prior limitations in GPU-based convolution fusion.

Findings

01

FCMs reduce memory accesses by up to 83%.

02

FCMs achieve up to 3.7x speedup over cuDNN.

03

Complete models outperform TVMs with up to 1.8x speedup and two-thirds energy savings.

Abstract

Depthwise and pointwise convolutions have fewer parameters and perform fewer operations than standard convolutions. As a result, they have become increasingly used in various compact DNNs, including convolutional neural networks (CNNs) and vision transformers (ViTs). However, they have a lower compute-to-memory-access ratio than standard convolutions, making their memory accesses often the performance bottleneck. This paper explores fusing depthwise and pointwise convolutions to overcome the memory access bottleneck. The focus is on fusing these operators on GPUs. The prior art on GPU-based fusion suffers from one or more of the following: (1) fusing either a convolution with an element-wise or multiple non-convolutional operators, (2) not explicitly optimizing for memory accesses, (3) not supporting depthwise convolutions. This paper proposes Fused Convolutional Modules (FCMs), a set…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fqararyah/fusing_dw_and_pw_on_gpus
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Image and Video Retrieval Techniques · 3D Shape Modeling and Analysis