Supporting Very Large Models using Automatic Dataflow Graph Partitioning

Minjie Wang; Chien-chin Huang; Jinyang Li

arXiv:1807.08887·cs.DC·February 22, 2019

Supporting Very Large Models using Automatic Dataflow Graph Partitioning

Minjie Wang, Chien-chin Huang, Jinyang Li

PDF

TL;DR

Tofu is a system that automatically partitions large deep neural network models across multiple GPUs by analyzing dataflow graphs, reducing memory usage and speeding up training.

Contribution

It introduces a novel operator semantics language and a recursive search algorithm for optimal partitioning to support large models on multiple GPUs.

Findings

01

Enables training of very large CNN and RNN models.

02

Achieves 25% to 400% speedup over alternative methods.

03

Reduces per-GPU memory footprint significantly.

Abstract

This paper presents Tofu, a system that partitions very large DNN models across multiple GPU devices to reduce per-GPU memory footprint. Tofu is designed to partition a dataflow graph of fine-grained tensor operators in order to work transparently with a general-purpose deep learning platform like MXNet. In order to automatically partition each operator, we propose to describe the semantics of an operator in a simple language which represents tensors as lambda functions mapping from tensor coordinates to values. To optimally partition different operators in a dataflow graph, Tofu uses a recursive search algorithm that minimizes the total communication cost. Our experiments on an 8-GPU machine show that Tofu enables the training of very large CNN and RNN models. It also achieves 25% - 400% speedup over alternative approaches to train very large models.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsTofu