Tiered Pruning for Efficient Differentialble Inference-Aware Neural Architecture Search
S{\l}awomir Kierat, Mateusz Sieniawski, Denys Fridman, Chen-Han Yu,, Szymon Migacz, Pawe{\l} Morkisz, Alex-Fit Florea

TL;DR
This paper introduces three innovative pruning methods for differentiable neural architecture search, significantly enhancing inference efficiency and achieving state-of-the-art results on ImageNet and COCO benchmarks.
Contribution
It presents Prunode, a stochastic bi-path building block, and novel pruning algorithms for blocks and layers within the SuperNet, advancing inference-aware neural architecture search.
Findings
Achieves new Pareto optimality for inference latency and accuracy on ImageNet.
Outperforms GPUNet and EfficientNet in object detection latency and mAP.
Reduces computational complexity with O(1) memory for inner hidden dimension search.
Abstract
We propose three novel pruning techniques to improve the cost and results of inference-aware Differentiable Neural Architecture Search (DNAS). First, we introduce Prunode, a stochastic bi-path building block for DNAS, which can search over inner hidden dimensions with O(1) memory and compute complexity. Second, we present an algorithm for pruning blocks within a stochastic layer of the SuperNet during the search. Third, we describe a novel technique for pruning unnecessary stochastic layers during the search. The optimized models resulting from the search are called PruNet and establishes a new state-of-the-art Pareto frontier for NVIDIA V100 in terms of inference latency for ImageNet Top-1 image classification accuracy. PruNet as a backbone also outperforms GPUNet and EfficientNet on the COCO object detection task on inference latency relative to mean Average Precision (mAP).
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
MethodsPruning · *Communicated@Fast*How Do I Communicate to Expedia? · Pointwise Convolution · RMSProp · Batch Normalization · Sigmoid Activation · Squeeze-and-Excitation Block · 1x1 Convolution · Gumbel Softmax · Depthwise Convolution
