Introducing JIRIAF: A Virtual Kubelet Integration for Optimizing HPC Resource Provisioning
Vardan Gyurjyan, Graham Heyes, Christopher Larrieu, David Lawrence,, Jeng-Yuan Tsai

TL;DR
JIRIAF introduces a Kubernetes-based framework with Virtual Kubelet for efficient, flexible HPC resource management across diverse environments, demonstrated through a real-world case study and a digital twin integration.
Contribution
The paper presents JIRIAF, a novel resource management framework that leverages Virtual Kubelet for dynamic HPC workload optimization across heterogeneous systems.
Findings
Effective resource management on NERSC's Perlmutter system
Successful deployment of data-stream processing pipelines
Enhanced system monitoring with a digital twin model
Abstract
The JIRIAF (JLab Integrated Research Infrastructure Across Facilities) framework is designed to streamline resource management and optimize high-performance computing (HPC) workloads across heterogeneous environments. Central to JIRIAF is the JIRIAF Resource Manager (JRM), which effectively leverages Kubernetes and Virtual Kubelet to manage resources dynamically, even in environments with restricted user privileges. By operating in userspace, JRM facilitates the execution of user applications as containers across diverse computing sites, ensuring unified control and monitoring. The framework's effectiveness is demonstrated through a case study involving the deployment of data-stream processing pipelines on the Perlmutter system at NERSC, showcasing its capability to manage large-scale HPC applications efficiently. Additionally, we discuss the integration of a digital twin model for a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems · Scientific Computing and Data Management · Cloud Computing and Resource Management
