DOPD: A Dynamic PD-Disaggregation Architecture for Maximizing Goodput in LLM Inference Serving

Junhan Liao; Minxian Xu; Wanyi Zheng; Yan Wang; Kejiang Ye; Rajkumar Buyya; Chengzhong Xu

arXiv:2511.20982·cs.DC·March 10, 2026

DOPD: A Dynamic PD-Disaggregation Architecture for Maximizing Goodput in LLM Inference Serving

Junhan Liao, Minxian Xu, Wanyi Zheng, Yan Wang, Kejiang Ye, Rajkumar Buyya, Chengzhong Xu

PDF

Open Access

TL;DR

This paper introduces DOPD, a dynamic system for LLM inference that optimizes resource allocation between prefill and decoding stages, significantly improving throughput and latency under varying workloads.

Contribution

DOPD is the first adaptive architecture that dynamically adjusts prefill and decoding instance ratios based on real-time load to maximize goodput in LLM serving.

Findings

01

Up to 1.5x increase in system goodput

02

Up to 67.5% reduction in time-to-first-token

03

Achieves over 99% SLOs with fewer resources

Abstract

To meet strict Service-Level Objectives (SLOs),contemporary Large Language Models (LLMs) decouple the prefill and decoding stages and place them on separate GPUs to mitigate the distinct bottlenecks inherent to each phase. However, the heterogeneity of LLM workloads causes producerconsumer imbalance between the two instance types in such disaggregated architecture. To address this problem, we propose DOPD (Dynamic Optimal Prefill/Decoding), a dynamic LLM inference system that adjusts instance allocations to achieve an optimal prefill-to-decoding (P/D) ratio based on real-time load monitoring. Combined with an appropriate request-scheduling policy, DOPD effectively resolves imbalances between prefill and decoding instances and mitigates resource allocation mismatches due to mixed-length requests under high concurrency. Experimental evaluations show that, compared with vLLM and DistServe…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBig Data and Digital Economy · Software System Performance and Reliability · Natural Language Processing Techniques