A dynamic parallel method for performance optimization on hybrid CPUs

Luo Yu; Liu Yucheng; Shen Haihao

arXiv:2411.19542·cs.DC·December 2, 2024

A dynamic parallel method for performance optimization on hybrid CPUs

Luo Yu, Liu Yucheng, Shen Haihao

PDF

Open Access

TL;DR

This paper introduces a dynamic parallel method that optimizes AI inference performance on hybrid CPUs by balancing workload across cores, significantly improving memory bandwidth utilization and inference speed.

Contribution

The paper presents a novel dynamic parallel approach specifically designed for hybrid CPUs to enhance AI inference performance by workload balancing.

Findings

01

Achieved over 90% memory bandwidth utilization on hybrid Intel CPUs.

02

Significantly increased LLM inference performance.

03

Effective workload balancing across CPU cores.

Abstract

The AIPC concept is gaining popularity, and more and more hybrid CPUs will be running AI models on client devices. However, the current AI inference framework overlooks the imbalanced hardware capability of hybrid CPUs, leading to low inference performance. To address this issue, we have introduced a dynamic parallel method for hybrid CPUs, which significantly increases LLM inference performance by balancing the workload for each core of a hybrid CPU before the parallel work starts. This method has enabled Neural Speed to achieve more than 90% (on average) of memory bandwidth on two hybrid Intel CPUs.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques