Installing, Running and Maintaining Large Linux Clusters at CERN
Vladimir Bahyl, Benjamin Chardi, Jan van Eldik, Ulrich Fuchs, Thorsten, Kleinwort, Martin Murth, Tim Smith

TL;DR
This paper shares CERN's practical experience in managing large Linux clusters, focusing on scalability, automation, security, and user service adaptation for high-performance computing environments.
Contribution
It details tools and processes developed for large-scale Linux cluster management, including installation, configuration, monitoring, and grid integration, with insights from CERN's five-year experience.
Findings
Improved cluster manageability through new tools and processes
Enhanced responsiveness and system utilization for user services
Progress in scaling and gridifying heterogeneous Linux clusters
Abstract
Having built up Linux clusters to more than 1000 nodes over the past five years, we already have practical experience confronting some of the LHC scale computing challenges: scalability, automation, hardware diversity, security, and rolling OS upgrades. This paper describes the tools and processes we have implemented, working in close collaboration with the EDG project [1], especially with the WP4 subtask, to improve the manageability of our clusters, in particular in the areas of system installation, configuration, and monitoring. In addition to the purely technical issues, providing shared interactive and batch services which can adapt to meet the diverse and changing requirements of our users is a significant challenge. We describe the developments and tuning that we have introduced on our LSF based systems to maximise both responsiveness to users and overall system utilisation.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems · Advanced Data Storage Technologies · Particle Detector Development and Performance
