Training-free Neural Architecture Search for RNNs and Transformers
Aaron Serianni (Princeton University), Jugal Kalita (University of, Colorado at Colorado Springs)

TL;DR
This paper introduces a training-free metric called hidden covariance for RNNs, demonstrating its effectiveness in neural architecture search, and highlights the importance of jointly optimizing search space and metrics for transformers.
Contribution
The paper develops a novel training-free metric for RNNs and provides insights into optimizing transformer search spaces for neural architecture search.
Findings
Hidden covariance outperforms existing metrics in predicting RNN performance.
Simplifying the transformer search space improves NAS efficiency.
Joint development of search space and metrics is crucial for effective NAS.
Abstract
Neural architecture search (NAS) has allowed for the automatic creation of new and effective neural network architectures, offering an alternative to the laborious process of manually designing complex architectures. However, traditional NAS algorithms are slow and require immense amounts of computing power. Recent research has investigated training-free NAS metrics for image classification architectures, drastically speeding up search algorithms. In this paper, we investigate training-free NAS metrics for recurrent neural network (RNN) and BERT-based transformer architectures, targeted towards language modeling tasks. First, we develop a new training-free metric, named hidden covariance, that predicts the trained performance of an RNN architecture and significantly outperforms existing training-free metrics. We experimentally evaluate the effectiveness of the hidden covariance metric…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Machine Learning and Data Classification
MethodsMulti-Head Attention · Attention Is All You Need · Pruning · Attention Dropout · Linear Warmup With Linear Decay · Residual Connection · Linear Layer · Layer Normalization · Softmax · Adam
