Hydra: A System for Large Multi-Model Deep Learning

Kabir Nagrecha; Arun Kumar

arXiv:2110.08633·cs.DC·August 5, 2022

Hydra: A System for Large Multi-Model Deep Learning

Kabir Nagrecha, Arun Kumar

PDF

Open Access 1 Repo

TL;DR

Hydra is a system that enables efficient large multi-model deep learning on commodity GPUs by optimizing execution and resource management, significantly improving training throughput over existing frameworks.

Contribution

Hydra introduces a holistic approach to optimize multi-model deep learning workloads, combining model-parallel execution with scalable parameter offloading and task-parallel scheduling.

Findings

01

Hydra achieves 50-100% higher throughput than DeepSpeed and GPipe.

02

It enables training of 6-billion parameter models on a single commodity GPU.

03

Hydra demonstrates near-linear scaling in multi-GPU setups.

Abstract

Scaling up model depth and size is now a common approach to raise accuracy in many deep learning (DL) applications, as evidenced by the widespread success of multi-billion or even trillion parameter models in natural language processing (NLP) research. Despite success in DL research and at major technology companies, broader practical adoption of such large models among domain scientists and businesses is still bottlenecked by GPU memory limits, high training costs, and low GPU availability, even on public clouds. Model selection needs further compound these resource challenges: users often need to compare dozens of models with different hyper-parameters or neural architectures to suit their specific task and dataset. In this paper, we present Hydra, a system designed to tackle such challenges by enabling out-of-the-box scaling for multi-large-model DL workloads on even commodity GPUs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

knagrecha/hydra
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Parallel Computing and Optimization Techniques · Machine Learning and Data Classification

MethodsAttention Is All You Need · Linear Layer · Cosine Annealing · Residual Connection · Dropout · Dense Connections · GPipe · Discriminative Fine-Tuning · Multi-Head Attention · Weight Decay