SmartLLMs Scheduler: A Framework for Cost-Effective LLMs Utilization

Yueyue Liu; Hongyu Zhang; Yuantian Miao

arXiv:2508.03258·cs.SE·August 6, 2025

SmartLLMs Scheduler: A Framework for Cost-Effective LLMs Utilization

Yueyue Liu, Hongyu Zhang, Yuantian Miao

PDF

TL;DR

The paper introduces SmartLLMs Scheduler, a dynamic framework that optimizes LLM deployment costs and response times by learning performance patterns and adapting strategies in real-time, demonstrated on software engineering tasks.

Contribution

It presents a novel dynamic scheduling framework with adaptive caching and real-time updates, improving cost-efficiency and responsiveness for LLM utilization.

Findings

01

Achieves 198.82% performance improvement over baselines.

02

Reduces processing time by 63.28%.

03

Effectively adapts to task variability in LLM deployment.

Abstract

Large Language Models (LLMs) such as GPT-4 and Llama have shown remarkable capabilities in a variety of software engineering tasks. Despite the advancements, their practical deployment faces challenges, including high financial costs, long response time, and varying performance, especially when handling a large number of queries (jobs). Existing optimization strategies for deploying LLMs for diverse tasks focus on static scheduling, which requires extensive training data for performance prediction, increasing the computational costs and limiting the applicability and flexibility. In this paper, we propose the SmartLLMs Scheduler (SLS), a dynamic and cost-effective scheduling solution. The key idea is to learn LLMs' performance on diverse tasks and incorporate their real-time feedback to update strategies periodically. Specifically, SLS incorporates three key components, including an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.