ENOVA: Autoscaling towards Cost-effective and Stable Serverless LLM Serving
Tao Huang, Pengfei Chen, Kyoka Gong, Jocky Hawk, Zachary Bright,, Wenxin Xie, Kecheng Huang, Zhi Ji

TL;DR
ENOVA is a comprehensive system that enables cost-effective, stable, and autoscaled serverless LLM deployment on multi-GPU clusters by automatic configuration, performance monitoring, and scheduling.
Contribution
It introduces a novel deployment, monitoring, and autoscaling framework specifically designed for serverless LLM serving on multi-GPU clusters, addressing low utilization and service quality issues.
Findings
ENOVA significantly outperforms existing methods in experiments.
It achieves high GPU utilization and stable LLM service.
Suitable for deployment in large-scale online systems.
Abstract
Since the increasing popularity of large language model (LLM) backend systems, it is common and necessary to deploy stable serverless serving of LLM on multi-GPU clusters with autoscaling. However, there exist challenges because the diversity and co-location of applications in multi-GPU clusters will lead to low service quality and GPU utilization. To address them, we build ENOVA, a deployment, monitoring and autoscaling service towards serverless LLM serving. ENOVA deconstructs the execution process of LLM service comprehensively, based on which ENOVA designs a configuration recommendation module for automatic deployment on any GPU clusters and a performance detection module for autoscaling. On top of them, ENOVA implements a deployment execution engine for multi-GPU cluster scheduling. The experiment results show that ENOVA significantly outperforms other state-of-the-art methods and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSmart Grid Security and Resilience · Blockchain Technology Applications and Security · Cloud Computing and Resource Management
Methodstravel james
