Exploration of Systolic-Vector Architecture with Resource Scheduling for   Dynamic ML Workloads

Jung-Hoon Kim; Sungyeob Yoo; Seungjae Moon; Joo-Young Kim

arXiv:2206.03060·cs.AR·June 8, 2022

Exploration of Systolic-Vector Architecture with Resource Scheduling for Dynamic ML Workloads

Jung-Hoon Kim, Sungyeob Yoo, Seungjae Moon, Joo-Young Kim

PDF

Open Access

TL;DR

This paper introduces a scalable systolic-vector architecture with heterogeneity-aware scheduling for dynamic ML workloads, significantly improving throughput and energy efficiency in cloud datacenters over traditional GPU solutions.

Contribution

It proposes a novel heterogeneous architecture with a unified model format and a scheduling algorithm that optimizes resource utilization for diverse DNN workloads.

Findings

01

Achieves 10.9x higher throughput than GPUs.

02

Attains 30.17x better energy efficiency.

03

Heterogeneity-aware scheduling boosts throughput by 81%.

Abstract

As artificial intelligence (AI) and machine learning (ML) technologies disrupt a wide range of industries, cloud datacenters face ever-increasing demand in inference workloads. However, conventional CPU-based servers cannot handle excessive computational requirements of deep neural network (DNN) models, while GPU-based servers suffer from huge power consumption and high operating cost. In this paper, we present a scalable systolic-vector architecture that can cope with dynamically changing DNN workloads in cloud datacenters. We first devise a lightweight DNN model description format called unified model format (UMF) that enables general model representation and fast decoding in hardware accelerator. Based on this model format, we propose a heterogeneous architecture that features a load balancer that performs a high-level workload distribution and multiple systolic-vector clusters, in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Parallel Computing and Optimization Techniques · Advanced Memory and Neural Computing