VibeServe: Can AI Agents Build Bespoke LLM Serving Systems?

Keisuke Kamahori; Shihang Li; Simon Peter; Baris Kasikci

arXiv:2605.06068·cs.AI·May 8, 2026

VibeServe: Can AI Agents Build Bespoke LLM Serving Systems?

Keisuke Kamahori, Shihang Li, Simon Peter, Baris Kasikci

PDF

1 Repo

TL;DR

VibeServe introduces an agentic system that automatically synthesizes bespoke LLM serving stacks, outperforming generic systems in non-standard scenarios by leveraging generation-time specialization.

Contribution

It is the first end-to-end agentic loop that designs tailored LLM serving systems, demonstrating advantages over traditional general-purpose stacks.

Findings

01

VibeServe remains competitive with vLLM in standard deployment.

02

In non-standard scenarios, VibeServe outperforms existing systems.

03

Generation-time specialization can surpass runtime generality in infrastructure design.

Abstract

For years, we have built LLM serving systems like any other critical infrastructure: a single general-purpose stack, hand-tuned over many engineer-years, meant to support every model and workload. In this paper, we take the opposite bet: a multi-agent loop that automatically synthesizes bespoke serving systems for different usage scenarios. We propose VibeServe, the first agentic loop that generates entire LLM serving stacks end-to-end. VibeServe uses an outer loop to plan and track the search over system designs, and an inner loop to implement candidates, check correctness, and measure performance on the target benchmark. In the standard deployment setting, where existing stacks are highly optimized, VibeServe remains competitive with vLLM, showing that generation-time specialization need not come at the cost of performance. More interestingly, in non-standard scenarios, VibeServe…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

uw-syfi/vibe-serve
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.