Efficient Accelerator for Dilated and Transposed Convolution with   Decomposition

Kuo-Wei Chang; and Tian-Sheuan Chang

arXiv:2205.02103·cs.AR·May 5, 2022

Efficient Accelerator for Dilated and Transposed Convolution with Decomposition

Kuo-Wei Chang, and Tian-Sheuan Chang

PDF

TL;DR

This paper introduces a decomposition-based hardware accelerator for dilated and transposed convolutions, significantly improving efficiency and speed on existing CNN hardware by reducing redundant computations.

Contribution

It presents a novel decomposition method that enables efficient execution of dilated and transposed convolutions on dense CNN hardware, overcoming previous design limitations.

Findings

01

Achieves 87.8% reduction in cycle counts

02

Provides 8.2x speedup over naive execution

03

Compatible with existing dense CNN hardware

Abstract

Hardware acceleration for dilated and transposed convolution enables real time execution of related tasks like segmentation, but current designs are specific for these convolutional types or suffer from complex control for reconfigurable designs. This paper presents a design that decomposes input or weight for dilated and transposed convolutions respectively to skip redundant computations and thus executes efficiently on existing dense CNN hardware as well. The proposed architecture can cut down 87.8\% of the cycle counts to achieve 8.2X speedup over a naive execution for the ENet case.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Methods1x1 Convolution · Dilated Convolution · Batch Normalization · ENet Initial Block · ENet Bottleneck · Max Pooling · ENet Dilated Bottleneck · SpatialDropout · Transposed convolution · Convolution