SmartLLMs Scheduler: A Framework for Cost-Effective LLMs Utilization
Yueyue Liu, Hongyu Zhang, Yuantian Miao

TL;DR
The paper introduces SmartLLMs Scheduler, a dynamic framework that optimizes LLM deployment costs and response times by learning performance patterns and adapting strategies in real-time, demonstrated on software engineering tasks.
Contribution
It presents a novel dynamic scheduling framework with adaptive caching and real-time updates, improving cost-efficiency and responsiveness for LLM utilization.
Findings
Achieves 198.82% performance improvement over baselines.
Reduces processing time by 63.28%.
Effectively adapts to task variability in LLM deployment.
Abstract
Large Language Models (LLMs) such as GPT-4 and Llama have shown remarkable capabilities in a variety of software engineering tasks. Despite the advancements, their practical deployment faces challenges, including high financial costs, long response time, and varying performance, especially when handling a large number of queries (jobs). Existing optimization strategies for deploying LLMs for diverse tasks focus on static scheduling, which requires extensive training data for performance prediction, increasing the computational costs and limiting the applicability and flexibility. In this paper, we propose the SmartLLMs Scheduler (SLS), a dynamic and cost-effective scheduling solution. The key idea is to learn LLMs' performance on diverse tasks and incorporate their real-time feedback to update strategies periodically. Specifically, SLS incorporates three key components, including an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
