Realizing Unaligned Block-wise Pruning for DNN Acceleration on Mobile Devices
Hayun Lee, Dongkun Shin

TL;DR
This paper introduces a fast, pseudo-optimal block selection algorithm and an efficient inference kernel for unaligned block-wise pruning, enabling effective DNN acceleration on mobile devices with minimal accuracy loss.
Contribution
It presents the BED algorithm for rapid unaligned block selection and a mobile-optimized inference kernel, improving DNN pruning efficiency and latency on mobile hardware.
Findings
Achieved similar latency to aligned block pruning with unaligned methods.
Demonstrated effectiveness on MobileNet and ResNet models on real mobile devices.
Reduced accuracy drop while maintaining high speedup.
Abstract
With the recent proliferation of on-device AI, there is an increasing need to run computationally intensive DNNs directly on mobile devices. However, the limited computing and memory resources of these devices necessitate effective pruning techniques. Block-wise pruning is promising due to its low accuracy drop tradeoff for speedup gains, but it requires block positions to be aligned with block size, hindering optimal position selection to minimize model accuracy drop. Unaligned block pruning (UBP) addresses this by allowing blocks to be selected at arbitrary positions, yet its practical use is limited by a time-consuming optimal block selection algorithm and lack of efficient inference kernels. In this paper, we propose a pseudo-optimal yet fast block selection algorithm called Block Expansion and Division (BED), which can be integrated into an iterative model training process.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEnergy Harvesting in Wireless Networks · Robotics and Automated Systems · Opportunistic and Delay-Tolerant Networks
MethodsAverage Pooling · Kaiming Initialization · Max Pooling · Convolution · Global Average Pooling · Pruning
