Efficient Latency-Aware CNN Depth Compression via Two-Stage Dynamic   Programming

Jinuk Kim; Yeonwoo Jeong; Deokjae Lee; Hyun Oh Song

arXiv:2301.12187·cs.LG·June 5, 2023

Efficient Latency-Aware CNN Depth Compression via Two-Stage Dynamic Programming

Jinuk Kim, Yeonwoo Jeong, Deokjae Lee, Hyun Oh Song

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a two-stage dynamic programming approach for efficient CNN depth compression that merges convolution layers to reduce inference latency, outperforming existing methods in speed and accuracy.

Contribution

It proposes a novel subset selection formulation for depth compression, solved via dynamic programming, enabling faster and more accurate CNN inference.

Findings

01

Achieves 1.41x speed-up on MobileNetV2 with minimal accuracy loss

02

Outperforms baseline methods in inference speed and accuracy

03

Effective end-to-end latency reduction for convolutional neural networks

Abstract

Recent works on neural network pruning advocate that reducing the depth of the network is more effective in reducing run-time memory usage and accelerating inference latency than reducing the width of the network through channel pruning. In this regard, some recent works propose depth compression algorithms that merge convolution layers. However, the existing algorithms have a constricted search space and rely on human-engineered heuristics. In this paper, we propose a novel depth compression algorithm which targets general convolution operations. We propose a subset selection problem that replaces inefficient activation layers with identity functions and optimally merges consecutive convolution operations into shallow equivalent convolution operations for efficient end-to-end inference latency. Since the proposed subset selection problem is NP-hard, we formulate a surrogate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

snu-mllab/efficient-cnn-depth-compression
pytorchOfficial

Videos

Efficient Latency-Aware CNN Depth Compression via Two-Stage Dynamic Programming· slideslive

Taxonomy

TopicsAdvanced Neural Network Applications · Video Surveillance and Tracking Methods · Human Pose and Action Recognition

MethodsPruning · Pointwise Convolution · Depthwise Convolution · Depthwise Separable Convolution · Average Pooling · 1x1 Convolution · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Batch Normalization · Inverted Residual Block · Convolution