Generalization Guarantees for Neural Architecture Search with Train-Validation Split
Samet Oymak, Mingchen Li, Mahdi Soltanolkotabi

TL;DR
This paper analyzes the statistical properties of neural architecture search with train-validation splits, showing how validation metrics can guide generalization and proposing bounds and methods for effective NAS.
Contribution
It provides new theoretical insights into NAS generalization, bounds for gradient-based search, and connections to kernel and matrix learning methods.
Findings
Validation loss properties indicate true test loss.
Gradient descent finds optimal architecture even with zero training error.
Spectral methods can efficiently solve the outer NAS problem.
Abstract
Neural Architecture Search (NAS) is a popular method for automatically designing optimized architectures for high-performance deep learning. In this approach, it is common to use bilevel optimization where one optimizes the model weights over the training data (inner problem) and various hyperparameters such as the configuration of the architecture over the validation data (outer problem). This paper explores the statistical aspects of such problems with train-validation splits. In practice, the inner problem is often overparameterized and can easily achieve zero loss. Thus, a-priori it seems impossible to distinguish the right hyperparameters based on training loss alone which motivates a better understanding of the role of train-validation split. To this aim this work establishes the following results. (1) We show that refined properties of the validation loss such as risk and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Machine Learning and Algorithms · Advanced Neural Network Applications
