HW-GPT-Bench: Hardware-Aware Architecture Benchmark for Language Models

Rhea Sanjay Sukthanker; Arber Zela; Benedikt Staffler; Aaron Klein,; Lennart Purucker; Joerg K.H. Franke; Frank Hutter

arXiv:2405.10299·cs.LG·November 5, 2024

HW-GPT-Bench: Hardware-Aware Architecture Benchmark for Language Models

Rhea Sanjay Sukthanker, Arber Zela, Benedikt Staffler, Aaron Klein,, Lennart Purucker, Joerg K.H. Franke, Frank Hutter

PDF

Open Access 2 Repos 1 Video

TL;DR

HW-GPT-Bench is a hardware-aware benchmark that uses surrogate models to efficiently evaluate and optimize GPT-2 based language models across multiple hardware metrics and devices.

Contribution

It introduces a surrogate-based benchmarking framework for rapid hardware metric estimation of GPT-2 architectures on diverse devices.

Findings

01

Accurately models latency and energy with calibrated surrogates.

02

Enables fast simulation of multi-objective optimization trajectories.

03

Supports evaluation of models up to 1.55B parameters.

Abstract

The increasing size of language models necessitates a thorough analysis across multiple dimensions to assess trade-offs among crucial hardware metrics such as latency, energy consumption, GPU memory usage, and performance. Identifying optimal model configurations under specific hardware constraints is becoming essential but remains challenging due to the computational load of exhaustive training and evaluation on multiple devices. To address this, we introduce HW-GPT-Bench, a hardware-aware benchmark that utilizes surrogate predictions to approximate various hardware metrics across 13 devices of architectures in the GPT-2 family, with architectures containing up to 1.55B parameters. Our surrogates, via calibrated predictions and reliable uncertainty estimates, faithfully model the heteroscedastic noise inherent in the energy and latency measurements. To estimate perplexity, we employ…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

HW-GPT-Bench: Hardware-Aware Architecture Benchmark for Language Models· slideslive

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Parallel Computing and Optimization Techniques

MethodsAttention Is All You Need · Cosine Annealing · Residual Connection · Discriminative Fine-Tuning · Weight Decay · Softmax · Layer Normalization · Byte Pair Encoding · Attention Dropout · Dropout