LiteTransformerSearch: Training-free Neural Architecture Search for   Efficient Language Models

Mojan Javaheripi; Gustavo H. de Rosa; Subhabrata Mukherjee; Shital; Shah; Tomasz L. Religa; Caio C. T. Mendes; Sebastien Bubeck; Farinaz; Koushanfar; Debadeepta Dey

arXiv:2203.02094·cs.LG·October 19, 2022·5 cites

LiteTransformerSearch: Training-free Neural Architecture Search for Efficient Language Models

Mojan Javaheripi, Gustavo H. de Rosa, Subhabrata Mukherjee, Shital, Shah, Tomasz L. Religa, Caio C. T. Mendes, Sebastien Bubeck, Farinaz, Koushanfar, Debadeepta Dey

PDF

Open Access 1 Repo 1 Video

TL;DR

LiteTransformerSearch introduces a training-free neural architecture search method that efficiently finds optimal Transformer models balancing performance and hardware constraints, using decoder parameters as a proxy for perplexity, applicable across diverse devices.

Contribution

The paper presents a novel training-free NAS algorithm, LTS, that leverages decoder parameter rank correlation with performance, enabling rapid, device-specific Transformer architecture optimization without training.

Findings

01

Achieves comparable perplexity with faster runtime and lower memory.

02

Runs on target devices without GPUs, reducing carbon footprint.

03

Outperforms 350M parameter OPT in accuracy and efficiency.

Abstract

The Transformer architecture is ubiquitously used as the building block of large-scale autoregressive language models. However, finding architectures with the optimal trade-off between task performance (perplexity) and hardware constraints like peak memory utilization and latency is non-trivial. This is exacerbated by the proliferation of various hardware. We leverage the somewhat surprising empirical observation that the number of decoder parameters in autoregressive Transformers has a high rank correlation with task performance, irrespective of the architecture topology. This observation organically induces a simple Neural Architecture Search (NAS) algorithm that uses decoder parameters as a proxy for perplexity without need for any model training. The search phase of our training-free algorithm, dubbed Lightweight Transformer Search (LTS), can be run directly on target devices since…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

microsoft/archai
pytorchOfficial

Videos

LiteTransformerSearch: Training-free Neural Architecture Search for Efficient Language Models· slideslive

Taxonomy

TopicsTopic Modeling · Machine Learning and Data Classification · Speech Recognition and Synthesis

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Discriminative Fine-Tuning · Weight Decay · Attention Dropout · GPT-2 · Linear Layer · Dense Connections · Residual Connection