Workload-Aware Hardware Accelerator Mining for Distributed Deep Learning Training
Muhammad Adnan, Amar Phanishayee, Janardhan Kulkarni, Prashant J. Nair, and Divya Mahajan

TL;DR
This paper introduces a workload-aware hardware accelerator search method for distributed deep learning training, optimizing for throughput under power and area constraints, and addressing both single-device and distributed scenarios.
Contribution
It presents a novel heuristic-based search approach that considers distributed training complexities, enabling efficient hardware design for diverse DNN workloads.
Findings
Achieves 12x higher throughput than inference-only designs.
Converges 31x faster than recent inference-focused work.
Improves throughput by 12% over TPU architecture.
Abstract
In this paper, we present a novel technique to search for hardware architectures of accelerators optimized for end-to-end training of deep neural networks (DNNs). Our approach addresses both single-device and distributed pipeline and tensor model parallel scenarios, latter being addressed for the first time. The search optimized accelerators for training relevant metrics such as throughput/TDP under a fixed area and power constraints. However, with the proliferation of specialized architectures and complex distributed training mechanisms, the design space exploration of hardware accelerators is very large. Prior work in this space has tried to tackle this by reducing the search space to either a single accelerator execution that too only for inference, or tuning the architecture for specific layers (e.g., convolution). Instead, we take a unique heuristic-based critical path-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Brain Tumor Detection and Classification · Robotics and Automated Systems
