Towards Accurate and Compact Architectures via Neural Architecture Transformer
Yong Guo, Yin Zheng, Mingkui Tan, Qi Chen, Zhipeng Li, Jian Chen,, Peilin Zhao, Junzhou Huang

TL;DR
This paper introduces NAT and NAT++, methods that optimize neural network architectures by replacing redundant operations with more efficient ones, leading to more accurate and compact models without extra computational cost.
Contribution
The paper proposes NAT++ which enlarges the transition space for architecture optimization using a two-level rule and BMSoftmax, improving performance over previous methods.
Findings
Transformed architectures outperform original and existing optimized models.
NAT++ achieves higher accuracy with fewer parameters.
The approach effectively reduces redundancy in neural architectures.
Abstract
Designing effective architectures is one of the key factors behind the success of deep neural networks. Existing deep architectures are either manually designed or automatically searched by some Neural Architecture Search (NAS) methods. However, even a well-designed/searched architecture may still contain many nonsignificant or redundant modules/operations. Thus, it is necessary to optimize the operations inside an architecture to improve the performance without introducing extra computational cost. To this end, we have proposed a Neural Architecture Transformer (NAT) method which casts the optimization problem into a Markov Decision Process (MDP) and seeks to replace the redundant operations with more efficient operations, such as skip or null connection. Note that NAT only considers a small number of possible transitions and thus comes with a limited search/transition space. As a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Dropout · Layer Normalization · Attention Is All You Need · Dense Connections · Softmax · Adam
