Integrated Hardware Architecture and Device Placement Search

Irene Wang; Jakub Tarnawski; Amar Phanishayee; Divya Mahajan

arXiv:2407.13143·cs.LG·July 19, 2024

Integrated Hardware Architecture and Device Placement Search

Irene Wang, Jakub Tarnawski, Amar Phanishayee, Divya Mahajan

PDF

Open Access 1 Repo

TL;DR

This paper presents a novel co-optimization framework for hardware architecture and device placement in distributed deep learning training, leading to improved throughput and resource utilization.

Contribution

It introduces a comprehensive algorithm that jointly optimizes hardware design and device placement, a first in this domain, to enhance training efficiency.

Findings

01

Achieves higher throughput than TPUv4 and Spotlight.

02

Effectively balances memory, computation, and data distribution.

03

Provides open-source implementation of the proposed framework.

Abstract

Distributed execution of deep learning training involves a dynamic interplay between hardware accelerator architecture and device placement strategy. This is the first work to explore the co-optimization of determining the optimal architecture and device placement strategy through novel algorithms, improving the balance of computational resources, memory usage, and data distribution. Our architecture search leverages tensor and vector units, determining their quantity and dimensionality, and on-chip and off-chip memory configurations. It also determines the microbatch size and decides whether to recompute or stash activations, balancing the memory footprint of training and storage size. For each explored architecture configuration, we use an Integer Linear Program (ILP) to find the optimal schedule for executing operators on the accelerator. The ILP results then integrate with a dynamic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

msr-fiddle/phaze
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVLSI and Analog Circuit Testing · Embedded Systems Design Techniques · VLSI and FPGA Design Techniques