LANA: Latency Aware Network Acceleration
Pavlo Molchanov, Jimmy Hall, Hongxu Yin, Jan Kautz, Nicolo, Fusi, Arash Vahdat

TL;DR
LANA is a fast, efficient neural network acceleration method that uses neural architecture search and optimization to produce models meeting specific latency constraints with improved accuracy and speed.
Contribution
LANA introduces a novel constrained ILP-based NAS approach that efficiently searches massive spaces and satisfies latency budgets, outperforming prior methods.
Findings
Achieves up to 3% accuracy improvement on compressed models.
Provides up to 5x speed-up with minimal accuracy loss.
Supports large search spaces and fast NAS within minutes.
Abstract
We introduce latency-aware network acceleration (LANA) - an approach that builds on neural architecture search techniques and teacher-student distillation to accelerate neural networks. LANA consists of two phases: in the first phase, it trains many alternative operations for every layer of the teacher network using layer-wise feature map distillation. In the second phase, it solves the combinatorial selection of efficient operations using a novel constrained integer linear optimization (ILP) approach. ILP brings unique properties as it (i) performs NAS within a few seconds to minutes, (ii) easily satisfies budget constraints, (iii) works on the layer-granularity, (iv) supports a huge search space , surpassing prior search approaches in efficacy and efficiency. In extensive experiments, we show that LANA yields efficient and accurate models constrained by a target latency…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Human Pose and Action Recognition · Brain Tumor Detection and Classification
MethodsEfficientNetV2 · Pointwise Convolution · Depthwise Convolution · Sigmoid Activation · Dropout · Depthwise Separable Convolution · Convolution · Batch Normalization · *Communicated@Fast*How Do I Communicate to Expedia? · Average Pooling
