Encodings for Prediction-based Neural Architecture Search
Yash Akhauri, Mohamed S. Abdelfattah

TL;DR
This paper investigates various encoding methods for predictor-based neural architecture search, introduces unified encodings for multiple search spaces, and presents a new predictor, FLAN, that significantly reduces training costs.
Contribution
It categorizes neural encodings, extends them to unified forms across search spaces, and develops FLAN, a predictor that greatly improves efficiency in NAS.
Findings
Unified encodings enable transfer across search spaces.
FLAN reduces predictor training costs by over an order of magnitude.
Extensive experiments validate the effectiveness of the proposed methods.
Abstract
Predictor-based methods have substantially enhanced Neural Architecture Search (NAS) optimization. The efficacy of these predictors is largely influenced by the method of encoding neural network architectures. While traditional encodings used an adjacency matrix describing the graph structure of a neural network, novel encodings embrace a variety of approaches from unsupervised pretraining of latent representations to vectors of zero-cost proxies. In this paper, we categorize and investigate neural encodings from three main types: structural, learned, and score-based. Furthermore, we extend these encodings and introduce \textit{unified encodings}, that extend NAS predictors to multiple search spaces. Our analysis draws from experiments conducted on over 1.5 million neural network architectures on NAS spaces such as NASBench-101 (NB101), NB201, NB301, Network Design Spaces (NDS), and…
Peer Reviews
Decision·ICML 2024 Poster
- The motivation to have an unified encoding across NAS spaces is important and as the authors mention, this is relevant when it comes to transfer learning across spaces and tasks. - The authors propose a new hybrid encoder that outperforms prior encodings and allows transferrability of predictors to new search spaces. This leads to improved sample efficiency compared to training predictors from scratch on a new search space. - Large-scale study of NAS encodings over 13 NAS search spaces with
- It seems that the performance predictor is transferable across search spaces, and can relatively predict the ranking good. However, as far as I saw this is done only on CIFAR-10, right? That means that if one wants to transfer a learned predictor on a new dataset, that would not be feasible with FLAN, or otherwise one would need to train FLAN on the said dataset from scratch. - Other than this, I do not have any major weaknesses regarding this paper. I think that this is an important work for
The paper is well-written and demonstrates an integration of ideas from prior work, further enhancing them to achieve SOTA sample efficiency in performance prediction and in sample-based NAS. Additionally, it shows superior Kendal tau correlation in performance prediction. The method also permits integration of new encodings, and hence allows to take advantage of developments in architecture encoding, such as new ZCPs. Furthermore, the unified encoding facilitates transfer learning across differ
- The Authors state that improvements gained in sample efficiency by pre-training FLAN on a source search space do not include the cost to pre-train the model. However, this makes sense only if the pre-training is done on a single source space and then transferred to any other space. From table 4, this doesn’t seem to be the case, and each target space has its own source space. - Experiments on sample-based NAS are limited to NB101. Extending this (for example to NB201) would give a better asse
Generalzing NAS predictors to cover multiple search spaces is essential and an important step forward. For the most part, the paper is easy to read and follow early on. Figures 1-3 especially are nicely done. There is detailed ablation on the design aspects of FLAN. Transfer experiments are performed, as is search.
The are issues with the contributions and statements made in this manuscript: First, DGF: The author's point out that "GCNs are prone to an over-smoothing problem", although really this issue affects Graph Neural Networks (GNNs) in general. The author's then attempt to validate the efficacy of FLAN's predictor using the DGF and GAT in Table 1. I am not convinced by these experiments. GCN was proposed in 2016 and since then other GNN-types like GAT, GIN [1], GATv2 [2], etc., all of whose manuscr
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Neural Networks and Applications · Time Series Analysis and Forecasting
