Unlock the Potential of Fine-grained LLM Serving via Dynamic Module Scaling
Jingfeng Wu, Yiyuan He, Minxian Xu, Xitong Gao, Kejiang Ye, Chengzhong Xu

TL;DR
CoCoServe introduces a dynamic, fine-grained scaling system for LLM serving that improves resource utilization, reduces costs by 46%, and enhances performance metrics like latency and throughput across various workloads.
Contribution
The paper presents CoCoServe, a novel elastic system enabling module-level dynamic scaling of LLMs, addressing resource management challenges in serving large models.
Findings
Cost reduction of 46% in resource utilization.
Latency improvements of 14%-75% over existing systems.
Throughput increases of 1.16x to 4x across models.
Abstract
The rise of large language models (LLMs) has created new opportunities across various fields but has also introduced significant challenges in resource management. Current LLM serving systems face a fundamental tension: balancing serving demands with limited resources while adapting to unpredictable traffic patterns. Static deployments lead to suboptimal resource utilization and performance degradation under dynamic workloads. Furthermore, the high cost of adjusting instances hinders dynamic scaling, limiting the true potential of efficient LLM serving. To address this, we propose CoCoServe, an elastic system that facilitates dynamic and fine-grained scaling. Its key innovation lies in the module-level operations for the replication and migration of LLM modules, such as decoder layers and projections. Through a comprehensive analysis of the trade-offs associated with these operations,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Surface Polishing Techniques · Metal and Thin Film Mechanics · Advancements in Photolithography Techniques
