NAT: Neural Architecture Transformer for Accurate and Compact   Architectures

Yong Guo; Yin Zheng; Mingkui Tan; Qi Chen; Jian Chen; Peilin Zhao,; Junzhou Huang

arXiv:1910.14488·cs.LG·January 14, 2020·73 cites

NAT: Neural Architecture Transformer for Accurate and Compact Architectures

Yong Guo, Yin Zheng, Mingkui Tan, Qi Chen, Jian Chen, Peilin Zhao,, Junzhou Huang

PDF

Open Access 1 Repo

TL;DR

This paper introduces NAT, a reinforcement learning-based method to optimize neural network architectures by replacing redundant operations with more efficient ones, resulting in more accurate and compact models.

Contribution

It proposes a novel reinforcement learning approach to transform neural architectures by removing redundancies, improving efficiency without extra computation.

Findings

01

Transformed architectures outperform original and existing optimized models on CIFAR-10.

02

NAT effectively reduces model complexity while maintaining or improving accuracy.

03

Experiments on ImageNet demonstrate significant performance gains with NAT.

Abstract

Designing effective architectures is one of the key factors behind the success of deep neural networks. Existing deep architectures are either manually designed or automatically searched by some Neural Architecture Search (NAS) methods. However, even a well-searched architecture may still contain many non-significant or redundant modules or operations (e.g., convolution or pooling), which may not only incur substantial memory consumption and computation cost but also deteriorate the performance. Thus, it is necessary to optimize the operations inside an architecture to improve the performance without introducing extra computation cost. Unfortunately, such a constrained optimization problem is NP-hard. To make the problem feasible, we cast the optimization problem into a Markov decision process (MDP) and seek to learn a Neural Architecture Transformer (NAT) to replace the redundant…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

guoyongcs/NAT
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Machine Learning and Data Classification

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Sigmoid Activation · Tanh Activation · Long Short-Term Memory · Residual Connection · Byte Pair Encoding · Dense Connections · Label Smoothing