libhclooc: Software Library Facilitating Out-of-core Implementations of Accelerator Kernels on Hybrid Computing Platforms
Daniel Hanlon, Hamidreza Khalighzadeh, Ravi Reddy Manumachu, Alexey, Lastovetsky

TL;DR
This paper introduces libhclooc, a library that unifies out-of-core programming for GPUs, PHIs, and FPGAs, enabling efficient large-scale kernel execution with reduced code complexity.
Contribution
It provides the first unified interface for out-of-core accelerator programming across multiple hardware types, improving performance and programmer productivity.
Findings
Libhclooc achieves up to 10% overhead compared to optimized vendor implementations.
Using libhclooc reduces code lines by 75%, enhancing productivity.
The library enables efficient out-of-core matrix multiplication on diverse accelerators.
Abstract
Hardware accelerators such as Graphics Processing Units (GPUs), Intel Xeon Phi co-processors (PHIs), and Field-Programmable Gate Arrays (FPGAs) are now ubiquitous in extreme-scale high performance computing (HPC), cloud, and Big data platforms to facilitate execution of workloads that demand high energy efficiency. They present unique interfaces and programming models therefore posing several limitations, which must be addressed to facilitate execution of large workloads. There is no library providing a unifying interface that allows programmers to write reusable out-of-core implementations of their data-parallel kernels that can run efficiently on different mainstream accelerators such as GPUs, PHIs, and FPGAs. We address this shortage in this paper. We present a library called libhclooc, which provides a unifying interface facilitating out-of-core implementations for data parallel…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
