Speeding up Resnet Architecture with Layers Targeted Low Rank   Decomposition

Walid Ahmed; Habib Hajimolahoseini; Austin Wen; Yang Liu

arXiv:2309.12412·cs.CV·September 25, 2023

Speeding up Resnet Architecture with Layers Targeted Low Rank Decomposition

Walid Ahmed, Habib Hajimolahoseini, Austin Wen, Yang Liu

PDF

Open Access

TL;DR

This paper explores hardware-aware low rank decomposition to compress ResNet50, achieving notable training and inference speedups on specific hardware with minimal accuracy loss.

Contribution

It introduces a hardware-aware compression method using low rank decomposition tailored for different hardware systems, improving speed while maintaining accuracy.

Findings

01

5.36% training speedup on Ascend910

02

15.79% inference speedup on Ascend310

03

1% accuracy drop compared to original model

Abstract

Compression of a neural network can help in speeding up both the training and the inference of the network. In this research, we study applying compression using low rank decomposition on network layers. Our research demonstrates that to acquire a speed up, the compression methodology should be aware of the underlying hardware as analysis should be done to choose which layers to compress. The advantage of our approach is demonstrated via a case study of compressing ResNet50 and training on full ImageNet-ILSVRC2012. We tested on two different hardware systems Nvidia V100 and Huawei Ascend910. With hardware targeted compression, results on Ascend910 showed 5.36% training speedup and 15.79% inference speed on Ascend310 with only 1% drop in accuracy compared to the original uncompressed model

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Image Processing Techniques · Medical Image Segmentation Techniques

MethodsAttentive Walk-Aggregating Graph Neural Network · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings