Architecture Aware Latency Constrained Sparse Neural Networks

Tianli Zhao; Qinghao Hu; Xiangyu He; Weixiang Xu; Jiaxing Wang; Cong; Leng; Jian Cheng

arXiv:2109.00170·cs.CV·September 2, 2021

Architecture Aware Latency Constrained Sparse Neural Networks

Tianli Zhao, Qinghao Hu, Xiangyu He, Weixiang Xu, Jiaxing Wang, Cong, Leng, Jian Cheng

PDF

Open Access

TL;DR

This paper introduces ALCS, a framework for pruning and accelerating CNNs tailored for mobile devices, combining architecture-aware pruning, a new sparse convolution algorithm, and latency estimation to optimize performance.

Contribution

It presents a novel architecture-aware pruning method with SIMD structure, a sparse convolution algorithm, and a latency estimation approach, all integrated into a constrained optimization framework.

Findings

01

Achieves better accuracy-latency trade-offs on mobile devices.

02

Demonstrates the effectiveness of SIMD-structured pruning and sparse convolution.

03

Provides a practical latency estimation method for sparse models.

Abstract

Acceleration of deep neural networks to meet a specific latency constraint is essential for their deployment on mobile devices. In this paper, we design an architecture aware latency constrained sparse (ALCS) framework to prune and accelerate CNN models. Taking modern mobile computation architectures into consideration, we propose Single Instruction Multiple Data (SIMD)-structured pruning, along with a novel sparse convolution algorithm for efficient computation. Besides, we propose to estimate the run time of sparse models with piece-wise linear interpolation. The whole latency constrained pruning task is formulated as a constrained optimization problem that can be efficiently solved with Alternating Direction Method of Multipliers (ADMM). Extensive experiments show that our system-algorithm co-design framework can achieve much better Pareto frontier among network accuracy and latency…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Sparse and Compressive Sensing Techniques · Indoor and Outdoor Localization Technologies

MethodsPruning · Attentive Walk-Aggregating Graph Neural Network · Convolution