The Case for Co-Designing Model Architectures with Hardware

Quentin Anthony; Jacob Hatef; Deepak Narayanan; Stella Biderman; Stas; Bekman; Junqi Yin; Aamir Shafi; Hari Subramoni; Dhabaleswar Panda

arXiv:2401.14489·cs.DC·February 1, 2024·1 cites

The Case for Co-Designing Model Architectures with Hardware

Quentin Anthony, Jacob Hatef, Deepak Narayanan, Stella Biderman, Stas, Bekman, Junqi Yin, Aamir Shafi, Hari Subramoni, Dhabaleswar Panda

PDF

Open Access 1 Repo 1 Models

TL;DR

This paper emphasizes the importance of co-designing deep learning model architectures with hardware considerations, demonstrating that optimized shapes can significantly boost GPU training throughput without sacrificing accuracy.

Contribution

It provides practical guidelines for designing transformer models optimized for GPU hardware, highlighting the impact of model shape on computational efficiency.

Findings

01

Optimized model shapes increase throughput by up to 39%.

02

Guidelines improve GPU training efficiency for transformer models.

03

Model accuracy is preserved despite shape optimizations.

Abstract

While GPUs are responsible for training the vast majority of state-of-the-art deep learning models, the implications of their architecture are often overlooked when designing new deep learning (DL) models. As a consequence, modifying a DL model to be more amenable to the target hardware can significantly improve the runtime performance of DL training and inference. In this paper, we provide a set of guidelines for users to maximize the runtime performance of their transformer models. These guidelines have been created by carefully considering the impact of various model hyperparameters controlling model shape on the efficiency of the underlying computation kernels executed on the GPU. We find the throughput of models with efficient model shapes is up to 39\% higher while preserving accuracy compared to models with a similar number of parameters but with unoptimized shapes.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

eleutherai/gpt-neox
pytorchOfficial

Models

🤗
akswelh/NEOX
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Machine Learning and Data Classification · Stochastic Gradient Optimization Techniques

MethodsSparse Evolutionary Training