SLO-Guard: Crash-Aware, Budget-Consistent Autotuning for SLO-Constrained LLM Serving

Christian Lysenst{\o}en

arXiv:2604.17627·cs.LG·April 21, 2026

SLO-Guard: Crash-Aware, Budget-Consistent Autotuning for SLO-Constrained LLM Serving

Christian Lysenst{\o}en

PDF

TL;DR

SLO-Guard is a crash-aware autotuning system for large language model serving that improves budget consistency and latency stability under SLO constraints by treating crashes as first-class observations.

Contribution

It introduces a novel crash-aware autotuning approach combining thermal budget annealing and TPE, with configuration repair and crash taxonomy, for more predictable tuning under failures.

Findings

01

SLO-Guard achieves higher budget consistency and latency stability than random search.

02

Both methods attain 75/75 feasibility with zero crashes in the study.

03

SLO-Guard's cross-seed latency variation is 4.4x tighter than random search.

Abstract

Serving large language models under latency service-level objectives (SLOs) is a configuration-heavy systems problem with an unusually failure-prone search space: many plausible configurations crash outright or miss user-visible latency targets, and standard black-box optimizers treat these failures as wasted trials. We present SLO-Guard, a crash-aware autotuner for vLLM serving that treats crashes as first-class observations. SLO-Guard combines a feasible-first Thermal Budget Annealing (TBA) exploration phase with a warm-started Tree-structured Parzen Estimator (TPE) exploitation phase; the handoff replays all exploration history, including crashes encoded as extreme constraint violations. We additionally contribute a configuration-repair pass, a GPU-aware KV-cache memory guard, and a four-category crash taxonomy. We evaluate SLO-Guard on Qwen2-1.5B served with vLLM 0.19 on an NVIDIA…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.