Training-free Neural Architecture Search for RNNs and Transformers

Aaron Serianni (Princeton University); Jugal Kalita (University of; Colorado at Colorado Springs)

arXiv:2306.00288·cs.LG·June 2, 2023·1 cites

Training-free Neural Architecture Search for RNNs and Transformers

Aaron Serianni (Princeton University), Jugal Kalita (University of, Colorado at Colorado Springs)

PDF

Open Access 1 Repo

TL;DR

This paper introduces a training-free metric called hidden covariance for RNNs, demonstrating its effectiveness in neural architecture search, and highlights the importance of jointly optimizing search space and metrics for transformers.

Contribution

The paper develops a novel training-free metric for RNNs and provides insights into optimizing transformer search spaces for neural architecture search.

Findings

01

Hidden covariance outperforms existing metrics in predicting RNN performance.

02

Simplifying the transformer search space improves NAS efficiency.

03

Joint development of search space and metrics is crucial for effective NAS.

Abstract

Neural architecture search (NAS) has allowed for the automatic creation of new and effective neural network architectures, offering an alternative to the laborious process of manually designing complex architectures. However, traditional NAS algorithms are slow and require immense amounts of computing power. Recent research has investigated training-free NAS metrics for image classification architectures, drastically speeding up search algorithms. In this paper, we investigate training-free NAS metrics for recurrent neural network (RNN) and BERT-based transformer architectures, targeted towards language modeling tasks. First, we develop a new training-free metric, named hidden covariance, that predicts the trained performance of an RNN architecture and significantly outperforms existing training-free metrics. We experimentally evaluate the effectiveness of the hidden covariance metric…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aaronserianni/training-free-nas
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Machine Learning and Data Classification

MethodsMulti-Head Attention · Attention Is All You Need · Pruning · Attention Dropout · Linear Warmup With Linear Decay · Residual Connection · Linear Layer · Layer Normalization · Softmax · Adam