HTCondor data movement at 100 Gbps
Igor Sfiligoi, Frank W\"urthwein, Thomas DeFanti, John Graham

TL;DR
This paper demonstrates that HTCondor can achieve data transfer speeds up to 90 Gbps using a 100 Gbps network interface, effectively meeting the needs of high throughput distributed computing environments.
Contribution
It shows how HTCondor's data movement can be significantly accelerated with high-speed networking, reaching near the network interface's capacity.
Findings
Achieves up to 90 Gbps data transfer rate with 100 Gbps NIC
Saturates typical university border network links
Validates HTCondor's suitability for high-throughput data tasks
Abstract
HTCondor is a major workload management system used in distributed high throughput computing (dHTC) environments, e.g., the Open Science Grid. One of the distinguishing features of HTCondor is the native support for data movement, allowing it to operate without a shared filesystem. Coupling data handling and compute scheduling is both convenient for users and allows for significant infrastructure flexibility but does introduce some limitations. The default HTCondor data transfer mechanism routes both the input and output data through the submission node, making it a potential bottleneck. In this document we show that by using a node equipped with a 100 Gbps network interface (NIC) HTCondor can serve data at up to 90 Gbps, which is sufficient for most current use cases, as it would saturate the border network links of most research universities at the time of writing.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
