Chrion: Optimizing Recurrent Neural Network Inference by Collaboratively   Utilizing CPUs and GPUs

Zinuo Cai; Hao Wang; Tao Song; Yang Hua; Ruhui Ma; Haibing Guan

arXiv:2307.11339·cs.DC·July 24, 2023

Chrion: Optimizing Recurrent Neural Network Inference by Collaboratively Utilizing CPUs and GPUs

Zinuo Cai, Hao Wang, Tao Song, Yang Hua, Ruhui Ma, Haibing Guan

PDF

Open Access

TL;DR

Chrion is a system that optimizes recurrent neural network inference by intelligently partitioning and scheduling computations across CPUs and GPUs in cloud clusters, significantly reducing latency and memory usage.

Contribution

It formulates the deployment as an NP-hard scheduling problem and proposes a method to partition models for efficient execution on heterogeneous devices.

Findings

01

Up to 19.4% reduction in execution latency.

02

GPU memory footprint reduced by 67.5%.

03

Effective model partitioning improves inference performance.

Abstract

Deploying deep learning models in cloud clusters provides efficient and prompt inference services to accommodate the widespread application of deep learning. These clusters are usually equipped with host CPUs and accelerators with distinct responsibilities to handle serving requests, i.e. generalpurpose CPUs for input preprocessing and domain-specific GPUs for forward computation. Recurrent neural networks play an essential role in handling temporal inputs and display distinctive computation characteristics because of their high inter-operator parallelism. Hence, we propose Chrion to optimize recurrent neural network inference by collaboratively utilizing CPUs and GPUs. We formulate the model deployment in the CPU-GPU cluster as an NP-hard scheduling problem of directed acyclic graphs on heterogeneous devices. Given an input model in the ONNX format and user-defined SLO requirement,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Graph Neural Networks · Graph Theory and Algorithms