TL;DR
This paper explores how oneAPI's co-execution strategies can optimize performance and energy efficiency in heterogeneous systems by leveraging static and dynamic load balancing on CPU-GPU architectures.
Contribution
It introduces co-execution strategies within oneAPI, integrating load-balancing algorithms and evaluating their impact on HPC benchmarks with improved efficiency.
Findings
Dynamic load balancing enhances performance.
Unified shared memory further improves efficiency.
Co-execution is beneficial for heterogeneous HPC workloads.
Abstract
Programming efficiently heterogeneous systems is a major challenge, due to the complexity of their architectures. Intel oneAPI, a new and powerful standards-based unified programming model, built on top of SYCL, addresses these issues. In this paper, oneAPI is provided with co-execution strategies to run the same kernel between different devices, enabling the exploitation of static and dynamic policies. On top of that, static and dynamic load-balancing algorithms are integrated and analyzed. This work evaluates the performance and energy efficiency for a well-known set of regular and irregular HPC benchmarks, using an integrated GPU and CPU. Experimental results show that co-execution is worthwhile when using dynamic algorithms, improving efficiency even more when using unified shared memory.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
