Scepsy: Serving Agentic Workflows Using Aggregate LLM Pipelines

Marcel Wagenl\"ander; Otto White; Britannio Jarrett; Pedro Silvestre; Yanda Tao; Guo Li; Huanzhou Zhu; Ll\'uis Vilanova; Peter Pietzuch

arXiv:2604.15186·cs.DC·April 17, 2026

Scepsy: Serving Agentic Workflows Using Aggregate LLM Pipelines

Marcel Wagenl\"ander, Otto White, Britannio Jarrett, Pedro Silvestre, Yanda Tao, Guo Li, Huanzhou Zhu, Ll\'uis Vilanova, Peter Pietzuch

PDF

TL;DR

Scepsy is a system that efficiently schedules complex, multi-LLM workflows on GPU clusters by leveraging stable execution share profiles to optimize latency and throughput.

Contribution

It introduces a novel approach using aggregate LLM execution shares and a lightweight predictor to optimize GPU allocations for agentic workflows.

Findings

01

Achieves up to 2.4x higher throughput

02

Reduces latency by up to 27x

03

Outperforms systems optimizing LLMs independently

Abstract

Agentic workflows carry out complex tasks by orchestrating multiple large language models (LLMs) and tools. Serving such workflows at a target throughput with low latency is challenging because they can be defined using arbitrary agentic frameworks and exhibit unpredictable execution times: execution may branch, fan-out, or recur in data-dependent ways. Since LLMs in workflows often outnumber available GPUs, their execution also leads to GPU oversubscription. We describe Scepsy, a new agentic serving system that efficiently schedules arbitrary multi-LLM agentic workflows onto a GPU cluster. Scepsy exploits the insight that, while agentic workflows have unpredictable end-to-end latencies, the shares of each LLM's total execution times are comparatively stable across executions. Scepsy decides on GPU allocations based on these aggregate shares: first, it profiles the LLMs under…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.