ECLIP: Energy-efficient and Practical Co-Location of ML Inference on Spatially Partitioned GPUs

Ryan Quach; Yidi Wang; Ali Jahanshahi; Daniel Wong; Hyoseung Kim

arXiv:2506.12598·eess.SY·June 17, 2025

ECLIP: Energy-efficient and Practical Co-Location of ML Inference on Spatially Partitioned GPUs

Ryan Quach, Yidi Wang, Ali Jahanshahi, Daniel Wong, Hyoseung Kim

PDF

Open Access

TL;DR

ECLIP is a framework that enhances energy efficiency and throughput of co-located ML inference on GPUs by minimizing repartitioning overheads through kernel-wise resource partitioning and optimized CU assignment.

Contribution

ECLIP introduces a low-overhead, kernel-wise resource partitioning framework with a resource optimizer to improve GPU utilization and energy efficiency during ML inference co-location.

Findings

01

13% throughput improvement

02

25% energy efficiency gain

03

Reduced repartitioning overheads

Abstract

As AI inference becomes mainstream, research has begun to focus on improving the energy consumption of inference servers. Inference kernels commonly underutilize a GPU's compute resources and waste power from idling components. To improve utilization and energy efficiency, multiple models can co-locate and share the GPU. However, typical GPU spatial partitioning techniques often experience significant overheads when reconfiguring spatial partitions, which can waste additional energy through repartitioning overheads or non-optimal partition configurations. In this paper, we present ECLIP, a framework to enable low-overhead energy-efficient kernel-wise resource partitioning between co-located inference kernels. ECLIP minimizes repartitioning overheads by pre-allocating pools of CU masked streams and assigns optimal CU assignments to groups of kernels through our resource allocation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Medical Image Segmentation Techniques · Brain Tumor Detection and Classification

MethodsFocus