PrefillShare: A Shared Prefill Module for KV Reuse in Multi-LLM Disaggregated Serving

Sunghyeon Woo; Hoseung Kim; Sunghwan Shim; Minjung Jo; Hyunjoon Jeong; Jeongtae Lee; Joonghoon Kim; Sungjae Lee; Baeseong Park; Se Jung Kwon; Dongsoo Lee

arXiv:2602.12029·cs.LG·February 13, 2026

PrefillShare: A Shared Prefill Module for KV Reuse in Multi-LLM Disaggregated Serving

Sunghyeon Woo, Hoseung Kim, Sunghwan Shim, Minjung Jo, Hyunjoon Jeong, Jeongtae Lee, Joonghoon Kim, Sungjae Lee, Baeseong Park, Se Jung Kwon, Dongsoo Lee

PDF

Open Access

TL;DR

PrefillShare is a novel method that enables sharing of the prefill stage across multiple models in disaggregated multi-LLM systems, significantly reducing latency and increasing throughput while maintaining accuracy.

Contribution

It introduces a prefill sharing algorithm that factorizes models, freezes the prefill module, and fine-tunes only the decode module, enabling efficient multi-model reuse in disaggregated serving.

Findings

01

Achieves 4.5x lower p95 latency in multi-model workloads.

02

Attains 3.9x higher throughput compared to baseline.

03

Maintains full fine-tuning accuracy across various tasks and models.

Abstract

Multi-agent systems increasingly orchestrate multiple specialized language models to solve complex real-world problems, often invoking them over a shared context. This execution pattern repeatedly processes the same prompt prefix across models. Consequently, each model redundantly executes the prefill stage and maintains its own key-value (KV) cache, increasing aggregate prefill load and worsening tail latency by intensifying prefill-decode interference in existing LLM serving stacks. Disaggregated serving reduces such interference by placing prefill and decode on separate GPUs, but disaggregation does not fundamentally eliminate inter-model redundancy in computation and KV storage for the same prompt. To address this issue, we propose PrefillShare, a novel algorithm that enables sharing the prefill stage across multiple models in a disaggregated setting. PrefillShare factorizes the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware System Performance and Reliability · Big Data and Digital Economy · Cloud Computing and Resource Management