DualScale: Energy-Efficient Disaggregated LLM Serving via Phase-Aware Placement and DVFS

Omar Basit; Yunzhao Liu; Z. Jonny Kong; Y. Charlie Hu

arXiv:2602.18755·cs.DC·April 7, 2026

DualScale: Energy-Efficient Disaggregated LLM Serving via Phase-Aware Placement and DVFS

Omar Basit, Yunzhao Liu, Z. Jonny Kong, Y. Charlie Hu

PDF

TL;DR

DualScale is a hierarchical energy optimization framework for disaggregated LLM serving that dynamically manages placement and GPU frequency to reduce energy consumption while meeting strict latency and throughput SLOs.

Contribution

It introduces a two-tier control system combining phase-aware placement and stage-specific frequency adaptation for energy-efficient LLM serving.

Findings

01

Reduces energy by up to 39% in prefill and 48% in decode.

02

Successfully meets TTFT and TPOT SLOs in a 16x H100 cluster.

03

Employs predictive models and stage-specific control for dynamic adaptation.

Abstract

Prefill/decode disaggregation is increasingly adopted in LLM serving to improve the latency-throughput tradeoff and meet strict TTFT and TPOT SLOs. However, LLM inference remains energy-hungry: autoscaling alone is too coarse-grained to track fast workload fluctuations, and applying fine-grained DVFS under disaggregation is complicated by phase-asymmetric dynamics and coupling between provisioning and frequency control. We present DualScale, a two-tier energy optimization framework for disaggregated LLM serving. DualScale jointly optimizes placement and DVFS across prefill and decode using predictive latency and power models. At coarse timescales, DualScale computes phase-aware placement and baseline frequencies that minimize energy while satisfying SLO constraints. At fine timescales, DualScale dynamically adapts GPU frequency per iteration using stage-specific control: model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.