Cicada: A Pipeline-Efficient Approach to Serverless Inference with Decoupled Management

Z. Wu; Y. Deng; J. Hu; L. Cui; Z. Zhang; L. Zeng; G. Min

arXiv:2502.20959·cs.DC·July 1, 2025

Cicada: A Pipeline-Efficient Approach to Serverless Inference with Decoupled Management

Z. Wu, Y. Deng, J. Hu, L. Cui, Z. Zhang, L. Zeng, G. Min

PDF

TL;DR

Cicada is a pipeline optimization framework for serverless ML inference that significantly reduces latency and improves pipeline utilization by decoupling weight loading, optimizing layer construction, and dynamically scheduling resources.

Contribution

It introduces three novel mechanisms—MiniLoader, WeightDecoupler, and Priority-Aware Scheduler—that collectively enhance serverless inference efficiency.

Findings

01

Reduces end-to-end inference latency by 61.59%.

02

Achieves up to 2.52x speedup in pipeline utilization.

03

Outperforms the state-of-the-art PISeL framework.

Abstract

Serverless computing has emerged as a pivotal paradigm for deploying Deep Learning (DL) models, offering automatic scaling and cost efficiency. However, the inherent cold start problem in serverless ML inference systems, particularly the time-consuming model loading process, remains a significant bottleneck. Utilizing pipelined model loading improves efficiency but still suffer from pipeline stalls due to sequential layer construction and monolithic weight loading. In this paper, we propose \textit{Cicada}, a novel pipeline optimization framework that coordinates computational, storage, and scheduling resources through three key mechanisms: (1) \textit{MiniLoader}: which reduces layer construction overhead by opportunistically optimizing parameter initialization; (2) \textit{WeightDecoupler}: decoupling weight file processing from layer construction, enabling asynchronous weight…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.