Auto-scaling HTCondor pools using Kubernetes compute resources

Igor Sfiligoi; Thomas DeFanti; Frank W\"urthwein

arXiv:2205.01004·cs.DC·July 11, 2022

Auto-scaling HTCondor pools using Kubernetes compute resources

Igor Sfiligoi, Thomas DeFanti, Frank W\"urthwein

PDF

TL;DR

This paper presents a method for automatically scaling HTCondor pools by integrating them with Kubernetes-managed compute resources, enabling demand-driven provisioning in both on-premises and cloud environments.

Contribution

It introduces a novel solution for autonomous, demand-driven resource provisioning of HTCondor pools using Kubernetes, applicable in diverse deployment scenarios.

Findings

01

Effective integration of Kubernetes with HTCondor for auto-scaling.

02

Successful deployment in on-premises and cloud environments.

03

Supports multiple Open Science Grid communities.

Abstract

HTCondor has been very successful in managing globally distributed, pleasantly parallel scientific workloads, especially as part of the Open Science Grid. HTCondor system design makes it ideal for integrating compute resources provisioned from anywhere, but it has very limited native support for autonomously provisioning resources managed by other solutions. This work presents a solution that allows for autonomous, demand-driven provisioning of Kubernetes-managed resources. A high-level overview of the employed architectures is presented, paired with the description of the setups used in both on-prem and Cloud deployments in support of several Open Science Grid communities. The experience suggests that the described solution should be generally suitable for contributing Kubernetes-based resources to existing HTCondor pools.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.