Accelerating End-Cloud Collaborative Inference via Near Bubble-free   Pipeline Optimization

Luyao Gao; Jianchun Liu; Hongli Xu; Sun Xu; Qianpiao Ma; Liusheng; Huang

arXiv:2501.12388·cs.DC·January 22, 2025

Accelerating End-Cloud Collaborative Inference via Near Bubble-free Pipeline Optimization

Luyao Gao, Jianchun Liu, Hongli Xu, Sun Xu, Qianpiao Ma, Liusheng, Huang

PDF

Open Access

TL;DR

COACH is a framework that optimizes end-cloud collaborative inference by minimizing pipeline bubbles, resulting in significantly faster inference and higher throughput while maintaining accuracy.

Contribution

The paper introduces COACH, a novel near bubble-free pipeline optimization framework with offline and online components for improved DNN inference in end-cloud collaboration.

Findings

01

Up to 1.7x faster inference compared to baselines.

02

Achieves 2.1x higher system throughput.

03

Maintains comparable accuracy with improved efficiency.

Abstract

End-cloud collaboration offers a promising strategy to enhance the Quality of Service (QoS) in DNN inference by offloading portions of the inference workload from end devices to cloud servers. Despite the potential, the complex model architectures and dynamic network conditions will introduce numerous bubbles (\ie, idle waiting time) in pipeline execution, resulting in inefficient resource utilization and degraded QoS. To address these challenges, we introduce a novel framework named COACH, designed for near bubble-free pipeline collaborative inference, thereby achieving low inference latency and high system throughput. Initially, COACH employs an \textit{offline} component that utilizes an efficient recursive divide-and-conquer algorithm to optimize both model partitioning and transmission quantization, aiming to minimize the occurrence of pipeline bubbles. Subsequently, the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCloud Computing and Resource Management · IoT and Edge/Fog Computing · Big Data and Business Intelligence