SWAP-NAS: Sample-Wise Activation Patterns for Ultra-fast NAS
Yameng Peng, Andy Song, Haytham M. Fayek, Vic Ciesielski, Xiaojun, Chang

TL;DR
SWAP-NAS introduces a training-free metric based on sample-wise activation patterns that significantly improves neural architecture search efficiency and accuracy across multiple benchmarks, reducing search time to minutes.
Contribution
The paper proposes SWAP-Score, a novel high-performance training-free metric based on activation patterns, outperforming existing metrics and enabling ultra-fast NAS.
Findings
SWAP-Score outperforms 15 existing metrics on NAS-Bench datasets.
Regularisation of SWAP-Score improves correlation to true performance.
SWAP-NAS achieves competitive results in minutes on CIFAR-10 and ImageNet.
Abstract
Training-free metrics (a.k.a. zero-cost proxies) are widely used to avoid resource-intensive neural network training, especially in Neural Architecture Search (NAS). Recent studies show that existing training-free metrics have several limitations, such as limited correlation and poor generalisation across different search spaces and tasks. Hence, we propose Sample-Wise Activation Patterns and its derivative, SWAP-Score, a novel high-performance training-free metric. It measures the expressivity of networks over a batch of input samples. The SWAP-Score is strongly correlated with ground-truth performance across various search spaces and tasks, outperforming 15 existing training-free metrics on NAS-Bench-101/201/301 and TransNAS-Bench-101. The SWAP-Score can be further enhanced by regularisation, which leads to even higher correlations in cell-based search space and enables model size…
Peer Reviews
Decision·ICLR 2024 spotlight
Strengths: - The proposed SWAP-Score and regularised SWAP-Score show much stronger correlations with ground-truth performance than existing training-free metrics on different spaces and tasks. - The regularised SWAP-Score can enable model size control during search and can further improve correlation in cell-based search spaces. - When integrated with an evolutionary search algorithm as SWAP-NAS, a combination of ultra-fast architecture search and highly competitive performance can be achieved o
- The paper only give numerical numbers on NAS Search Space, and did not show or analyze any searched neural architectures, to back the claim that Sample-Wise Activation Patterns would measure the network’s expressivity more accurately.
1. The proposed metric is efficient and has a good correlation with the ground truth performance across various search spaces and tasks, beating all existing training-free metrics in public benchmarks. 2. The metric combined with the evolutionary search method can achieve good performance with extremely small search costs.
1. Missing references: PINAT: A Permutation INvariance Augmented Transformer for NAS Predictor AAAI 2023 TNASP: A Transformer-based NAS Predictor with a Self-evolution Framework NeurIPS 2021 Please cite and compare with them in Table 1 and Table 2. 2. The method is proposed to be used in the ReLU activation function-based network, which is not been proven to work well on other nonlinear activation function-based networks or not.
- Robust Experimental Results: Upon meticulously examining and executing the code provided in the supplementary files, I affirmed the robustness of the experimental results by myself. The effectiveness of the proposed SWAP-NAS has been convincingly validated through my independent verification. - Comprehensive Experimental Validation: The authors have conducted extensive experiments across multiple NAS-Bench datasets, demonstrating the clear superiority of SWAP-NAS. This broad-spectrum analysis
- Limited Applicability to ReLU-based Networks: One limitation of this approach is its dependency on neural networks using ReLU activations. While ReLU is widely used, it represents a subset of possible activation functions, which narrows the scope of application and might not encompass the full diversity of network architectures. - Similarity to NWOT Format: It's worth noting that the format of the proposed zero-cost proxy shares similarities with existing methods like NWOT. While not necessar
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques
