Pie: A Programmable Serving System for Emerging LLM Applications

In Gim; Zhiyao Ma; Seung-seob Lee; Lin Zhong

arXiv:2510.24051·cs.CL·October 29, 2025

Pie: A Programmable Serving System for Emerging LLM Applications

In Gim, Zhiyao Ma, Seung-seob Lee, Lin Zhong

PDF

TL;DR

Pie is a flexible, programmable serving system for large language models that allows custom workflows and optimizations, improving latency and throughput for emerging LLM applications.

Contribution

Pie introduces a novel programmable serving architecture using inferlets and WebAssembly, enabling customizable LLM workflows without system modifications.

Findings

01

Matches state-of-the-art performance with minimal latency overhead

02

Significantly improves latency and throughput on agentic workflows

03

Enables application-specific optimizations for LLM serving

Abstract

Emerging large language model (LLM) applications involve diverse reasoning strategies and agentic workflows, straining the capabilities of existing serving systems built on a monolithic token generation loop. This paper introduces Pie, a programmable LLM serving system designed for flexibility and efficiency. Pie decomposes the traditional generation loop into fine-grained service handlers exposed via an API and delegates control of the generation process to user-provided programs, called inferlets. This enables applications to implement new KV cache strategies, bespoke generation logic, and seamlessly integrate computation and I/O-entirely within the application, without requiring modifications to the serving system. Pie executes inferlets using WebAssembly, benefiting from its lightweight sandboxing. Our evaluation shows Pie matches state-of-the-art performance on standard tasks…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.