A dynamic parallel method for performance optimization on hybrid CPUs
Luo Yu, Liu Yucheng, Shen Haihao

TL;DR
This paper introduces a dynamic parallel method that optimizes AI inference performance on hybrid CPUs by balancing workload across cores, significantly improving memory bandwidth utilization and inference speed.
Contribution
The paper presents a novel dynamic parallel approach specifically designed for hybrid CPUs to enhance AI inference performance by workload balancing.
Findings
Achieved over 90% memory bandwidth utilization on hybrid Intel CPUs.
Significantly increased LLM inference performance.
Effective workload balancing across CPU cores.
Abstract
The AIPC concept is gaining popularity, and more and more hybrid CPUs will be running AI models on client devices. However, the current AI inference framework overlooks the imbalanced hardware capability of hybrid CPUs, leading to low inference performance. To address this issue, we have introduced a dynamic parallel method for hybrid CPUs, which significantly increases LLM inference performance by balancing the workload for each core of a hybrid CPU before the parallel work starts. This method has enabled Neural Speed to achieve more than 90% (on average) of memory bandwidth on two hybrid Intel CPUs.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques
