LANA: Latency Aware Network Acceleration

Pavlo Molchanov; Jimmy Hall; Hongxu Yin; Jan Kautz; Nicolo; Fusi; Arash Vahdat

arXiv:2107.10624·cs.CV·November 19, 2021

LANA: Latency Aware Network Acceleration

Pavlo Molchanov, Jimmy Hall, Hongxu Yin, Jan Kautz, Nicolo, Fusi, Arash Vahdat

PDF

Open Access

TL;DR

LANA is a fast, efficient neural network acceleration method that uses neural architecture search and optimization to produce models meeting specific latency constraints with improved accuracy and speed.

Contribution

LANA introduces a novel constrained ILP-based NAS approach that efficiently searches massive spaces and satisfies latency budgets, outperforming prior methods.

Findings

01

Achieves up to 3% accuracy improvement on compressed models.

02

Provides up to 5x speed-up with minimal accuracy loss.

03

Supports large search spaces and fast NAS within minutes.

Abstract

We introduce latency-aware network acceleration (LANA) - an approach that builds on neural architecture search techniques and teacher-student distillation to accelerate neural networks. LANA consists of two phases: in the first phase, it trains many alternative operations for every layer of the teacher network using layer-wise feature map distillation. In the second phase, it solves the combinatorial selection of efficient operations using a novel constrained integer linear optimization (ILP) approach. ILP brings unique properties as it (i) performs NAS within a few seconds to minutes, (ii) easily satisfies budget constraints, (iii) works on the layer-granularity, (iv) supports a huge search space $O (1 0^{100})$ , surpassing prior search approaches in efficacy and efficiency. In extensive experiments, we show that LANA yields efficient and accurate models constrained by a target latency…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Human Pose and Action Recognition · Brain Tumor Detection and Classification

MethodsEfficientNetV2 · Pointwise Convolution · Depthwise Convolution · Sigmoid Activation · Dropout · Depthwise Separable Convolution · Convolution · Batch Normalization · *Communicated@Fast*How Do I Communicate to Expedia? · Average Pooling