Towards Accurate and Compact Architectures via Neural Architecture   Transformer

Yong Guo; Yin Zheng; Mingkui Tan; Qi Chen; Zhipeng Li; Jian Chen,; Peilin Zhao; Junzhou Huang

arXiv:2102.10301·cs.CV·February 23, 2021·1 cites

Towards Accurate and Compact Architectures via Neural Architecture Transformer

Yong Guo, Yin Zheng, Mingkui Tan, Qi Chen, Zhipeng Li, Jian Chen,, Peilin Zhao, Junzhou Huang

PDF

Open Access 2 Repos

TL;DR

This paper introduces NAT and NAT++, methods that optimize neural network architectures by replacing redundant operations with more efficient ones, leading to more accurate and compact models without extra computational cost.

Contribution

The paper proposes NAT++ which enlarges the transition space for architecture optimization using a two-level rule and BMSoftmax, improving performance over previous methods.

Findings

01

Transformed architectures outperform original and existing optimized models.

02

NAT++ achieves higher accuracy with fewer parameters.

03

The approach effectively reduces redundancy in neural architectures.

Abstract

Designing effective architectures is one of the key factors behind the success of deep neural networks. Existing deep architectures are either manually designed or automatically searched by some Neural Architecture Search (NAS) methods. However, even a well-designed/searched architecture may still contain many nonsignificant or redundant modules/operations. Thus, it is necessary to optimize the operations inside an architecture to improve the performance without introducing extra computational cost. To this end, we have proposed a Neural Architecture Transformer (NAT) method which casts the optimization problem into a Markov Decision Process (MDP) and seeks to replace the redundant operations with more efficient operations, such as skip or null connection. Note that NAT only considers a small number of possible transitions and thus comes with a limited search/transition space. As a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Dropout · Layer Normalization · Attention Is All You Need · Dense Connections · Softmax · Adam