Ten Simple Rules for Success with HPC, i.e. Responsibly BASHing that   Linux Cluster

Jamie J. Alnasir

arXiv:2101.06737·cs.DC·September 1, 2021

Ten Simple Rules for Success with HPC, i.e. Responsibly BASHing that Linux Cluster

Jamie J. Alnasir

PDF

TL;DR

This paper offers practical guidelines for effectively and responsibly using high-performance computing clusters, addressing both novice and experienced users to optimize performance and minimize issues.

Contribution

It provides a set of ten simple, actionable rules applicable across various HPC platforms to improve user practices and system efficiency.

Findings

01

Enhanced user understanding of HPC best practices

02

Reduced common user errors and system load

03

Improved overall HPC resource utilization

Abstract

High-performance computing (HPC) clusters are widely used in-house at scientific and academic research institutions. For some users, the transition from running their analyses on a single workstation to running them on a complex, multi-tenanted cluster, usually employing some degree of parallelism, can be challenging, if not bewildering, especially for users whose role is not predominantly computational in nature. On the other hand, there are more experienced users, who can benefit from pointers on how to get the best from their use of HPC. This Ten Simple Rules guide is aimed at helping you identify ways to improve your utilisation of HPC, avoiding common pitfalls that can negatively impact other users and will also help ease the load (pun intended) on your HPC sysadmin. It is intended to provide technical advice common to the use of HPC platforms such as LSF, Slurm, PBS/Torque, SGE,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.