SERFLOW: A Cross-Service Cost Optimization Framework for SLO-Aware Dynamic ML Inference

Zongshun Zhang; Ibrahim Matta

arXiv:2510.27182·cs.LG·November 3, 2025

SERFLOW: A Cross-Service Cost Optimization Framework for SLO-Aware Dynamic ML Inference

Zongshun Zhang, Ibrahim Matta

PDF

Open Access

TL;DR

SERFLOW introduces a cost-effective, adaptive framework for ML inference that dynamically offloads model stages across FaaS and IaaS, optimizing resource use and reducing cloud costs amidst real-world uncertainties.

Contribution

It models ML inference as multi-stage requests and proposes a novel resource provisioning method combining serverless and VM resources for cost efficiency.

Findings

01

Reduced cloud costs by over 23%

02

Effectively handles long-tail and variable request distributions

03

Balances load across FaaS and IaaS resources

Abstract

Dynamic offloading of Machine Learning (ML) model partitions across different resource orchestration services, such as Function-as-a-Service (FaaS) and Infrastructure-as-a-Service (IaaS), can balance processing and transmission delays while minimizing costs of adaptive inference applications. However, prior work often overlooks real-world factors, such as Virtual Machine (VM) cold starts, requests under long-tail service time distributions, etc. To tackle these limitations, we model each ML query (request) as traversing an acyclic sequence of stages, wherein each stage constitutes a contiguous block of sparse model parameters ending in an internal or final classifier where requests may exit. Since input-dependent exit rates vary, no single resource configuration suits all query distributions. IaaS-based VMs become underutilized when many requests exit early, yet rapidly scaling to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCloud Computing and Resource Management · IoT and Edge/Fog Computing · Big Data and Digital Economy