NAT: Neural Architecture Transformer for Accurate and Compact Architectures
Yong Guo, Yin Zheng, Mingkui Tan, Qi Chen, Jian Chen, Peilin Zhao,, Junzhou Huang

TL;DR
This paper introduces NAT, a reinforcement learning-based method to optimize neural network architectures by replacing redundant operations with more efficient ones, resulting in more accurate and compact models.
Contribution
It proposes a novel reinforcement learning approach to transform neural architectures by removing redundancies, improving efficiency without extra computation.
Findings
Transformed architectures outperform original and existing optimized models on CIFAR-10.
NAT effectively reduces model complexity while maintaining or improving accuracy.
Experiments on ImageNet demonstrate significant performance gains with NAT.
Abstract
Designing effective architectures is one of the key factors behind the success of deep neural networks. Existing deep architectures are either manually designed or automatically searched by some Neural Architecture Search (NAS) methods. However, even a well-searched architecture may still contain many non-significant or redundant modules or operations (e.g., convolution or pooling), which may not only incur substantial memory consumption and computation cost but also deteriorate the performance. Thus, it is necessary to optimize the operations inside an architecture to improve the performance without introducing extra computation cost. Unfortunately, such a constrained optimization problem is NP-hard. To make the problem feasible, we cast the optimization problem into a Markov decision process (MDP) and seek to learn a Neural Architecture Transformer (NAT) to replace the redundant…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Machine Learning and Data Classification
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Sigmoid Activation · Tanh Activation · Long Short-Term Memory · Residual Connection · Byte Pair Encoding · Dense Connections · Label Smoothing
