Architecture Aware Latency Constrained Sparse Neural Networks
Tianli Zhao, Qinghao Hu, Xiangyu He, Weixiang Xu, Jiaxing Wang, Cong, Leng, Jian Cheng

TL;DR
This paper introduces ALCS, a framework for pruning and accelerating CNNs tailored for mobile devices, combining architecture-aware pruning, a new sparse convolution algorithm, and latency estimation to optimize performance.
Contribution
It presents a novel architecture-aware pruning method with SIMD structure, a sparse convolution algorithm, and a latency estimation approach, all integrated into a constrained optimization framework.
Findings
Achieves better accuracy-latency trade-offs on mobile devices.
Demonstrates the effectiveness of SIMD-structured pruning and sparse convolution.
Provides a practical latency estimation method for sparse models.
Abstract
Acceleration of deep neural networks to meet a specific latency constraint is essential for their deployment on mobile devices. In this paper, we design an architecture aware latency constrained sparse (ALCS) framework to prune and accelerate CNN models. Taking modern mobile computation architectures into consideration, we propose Single Instruction Multiple Data (SIMD)-structured pruning, along with a novel sparse convolution algorithm for efficient computation. Besides, we propose to estimate the run time of sparse models with piece-wise linear interpolation. The whole latency constrained pruning task is formulated as a constrained optimization problem that can be efficiently solved with Alternating Direction Method of Multipliers (ADMM). Extensive experiments show that our system-algorithm co-design framework can achieve much better Pareto frontier among network accuracy and latency…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Sparse and Compressive Sensing Techniques · Indoor and Outdoor Localization Technologies
MethodsPruning · Attentive Walk-Aggregating Graph Neural Network · Convolution
