Sandwich: Joint Configuration Search and Hot-Switching for Efficient CPU LLM Serving

Juntao Zhao; Jiuru Li; Chuan Wu

arXiv:2507.18454·cs.AR·April 16, 2026

Sandwich: Joint Configuration Search and Hot-Switching for Efficient CPU LLM Serving

Juntao Zhao, Jiuru Li, Chuan Wu

PDF

TL;DR

Sandwich is a comprehensive CPU LLM serving system that optimizes configuration and hot-switching, achieving significant speedups and latency reductions across various CPU platforms.

Contribution

It introduces a full-stack approach with phase-wise plan switching, hardware-aware core allocation, and dynamic tensor program generation for efficient CPU LLM serving.

Findings

01

Average 2.01x end-to-end speedup across five CPU platforms

02

Up to 3.40x latency reduction over state-of-the-art systems

03

Kernels match static compiler performance with much lower tuning cost

Abstract

CPUs are critical for LLM serving due to their availability, cost efficiency, and edge applicability. However, efficient CPU serving is hindered by conflicting prefill/decode resource demands under non-disaggregated deployment constraints--existing solutions fail to avoid cross-phase interference, ignore sub-NUMA hardware structures, and deliver suboptimal dynamic-shape kernel performance. We propose Sandwich, a full-stack CPU LLM serving system with three core innovations addressing these challenges: (1) seamless phase-wise plan switching to eliminate cross-phase interference; (2) TopoTree, a tree-based hardware abstraction for automated substructure-aware (e.g., LLC slices) partial core allocation; (3) fast-start-then-finetune dynamic-shape tensor program generation. Across five x86/ARM CPU platforms, Sandwich achieves an average 2.01x end-to-end speedup and up to 3.40x latency…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.