Hybrid Heterogeneous Clusters Can Lower the Energy Consumption of LLM   Inference Workloads

Grant Wilkins; Srinivasan Keshav; and Richard Mortier

arXiv:2407.00010·cs.DC·July 2, 2024

Hybrid Heterogeneous Clusters Can Lower the Energy Consumption of LLM Inference Workloads

Grant Wilkins, Srinivasan Keshav, and Richard Mortier

PDF

Open Access

TL;DR

This paper proposes a hybrid data center model with a dynamic scheduling framework that allocates LLM tasks to different hardware based on workload, reducing energy consumption by 7.5%.

Contribution

It introduces a workload-aware hybrid scheduling strategy that optimizes energy efficiency in LLM inference workloads across heterogeneous hardware.

Findings

01

Reduces CPU+GPU energy consumption by 7.5% with the hybrid approach.

02

Uses workload-aware task allocation based on input/output tokens.

03

Demonstrates energy savings in a representative LLM dataset.

Abstract

Both the training and use of Large Language Models (LLMs) require large amounts of energy. Their increasing popularity, therefore, raises critical concerns regarding the energy efficiency and sustainability of data centers that host them. This paper addresses the challenge of reducing energy consumption in data centers running LLMs. We propose a hybrid data center model that uses a cost-based scheduling framework to dynamically allocate LLM tasks across hardware accelerators that differ in their energy efficiencies and computational capabilities. Specifically, our workload-aware strategy determines whether tasks are processed on energy-efficient processors or high-performance GPUs based on the number of input and output tokens in a query. Our analysis of a representative LLM dataset, finds that this hybrid strategy can reduce CPU+GPU energy consumption by 7.5% compared to a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Storage Technologies · Privacy-Preserving Technologies in Data