Topology-aware Preemptive Scheduling for Co-located LLM Workloads

Ping Zhang; Lei Su; Jinjie Yang; Xin Chen

arXiv:2411.11560·cs.DC·November 19, 2024

Topology-aware Preemptive Scheduling for Co-located LLM Workloads

Ping Zhang, Lei Su, Jinjie Yang, Xin Chen

PDF

Open Access 1 Repo

TL;DR

This paper introduces a topology-aware preemptive scheduling method for co-located large language model workloads, significantly improving resource utilization and performance by aligning resource topology with workload priorities.

Contribution

It presents a novel fine-grained topology-aware preemption approach that ensures resource topology preferences are met, enhancing efficiency for co-located LLM workloads.

Findings

01

Preemption efficiency increased by 55%.

02

Improved resource utilization in co-located workloads.

03

Enhanced performance for latency-sensitive LLM services.

Abstract

Hosting diverse large language model workloads in a unified resource pool through co-location is cost-effective. For example, long-running chat services generally follow diurnal traffic patterns, which inspire co-location of batch jobs to fulfill resource valleys between successive peaks, and thus to saturate resource allocation in cluster-wide scope. These heterogeneous workloads often have different business priorities, and therefore preemption can be leveraged for resource elasticity. However, workloads often have distinct topology preferences as well. The resources released by lower-priority instances may fail to meet the requirements of high-priority online services which are usually latency-sensitive. The root cause behind such mis-match is a lack of topology awareness of resource scheduler, especially during preemption. To bridge this gap, we develop a fine-grained topology-aware…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

agiping/godel-scheduler
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDistributed and Parallel Computing Systems · Scheduling and Optimization Algorithms