Regulating Branch Parallelism in LLM Serving

Swapnil Gandhi; Siva Hari; William J. Dally; Christos Kozyrakis

arXiv:2605.06914·cs.DC·May 11, 2026

Regulating Branch Parallelism in LLM Serving

Swapnil Gandhi, Siva Hari, William J. Dally, Christos Kozyrakis

PDF

TL;DR

This paper introduces TAPER, a dynamic admission controller for branch parallelism in LLM serving, which optimizes throughput and latency by regulating branch execution based on workload conditions.

Contribution

TAPER is a novel per-step admission controller that adaptively manages branch parallelism, improving throughput and latency in LLM serving systems.

Findings

01

TAPER improves goodput by 1.77x over IRP-Off and 1.48x over IRP-Eager.

02

TAPER maintains over 95% SLO attainment.

03

Existing methods are brittle due to fixed caps or eager execution, leading to inefficiencies.

Abstract

Recent methods expose intra-request parallelism in LLM outputs, allowing independent branches to decode concurrently. Existing serving systems execute these branches eagerly or under fixed caps. We show that both are brittle: eager admission inflates the shared decode step, degrading co-batched requests in serial stages, while conservative fixed caps forgo the throughput that motivated exposing branches in the first place. We call the excess step latency caused by admitted branches the branch externality and show that the safe width depends on batch composition, context lengths, and accumulated slack, all of which change continuously over a workload trace. We introduce TAPER, a per-step admission controller that treats extra branches as opportunistic work, admitted only when the predicted branch externality fits within the batch's current slack budget. Per-step regulation is practical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.