Automating ATLAS Computing Operations using the Site Status Board
Julia Andreeva, Carlos Borrego Iglesias, Simone Campana, Alessandro Di, Girolamo, Ivan Dzhunov, Xavier Espinal Curull, Stavro Gayazov, Erekle, Magradze, Michal Maciej Nowotka, Lorenzo Rinaldi, Pablo Saiz, Jaroslava, Schovancova, Graeme Andrew Stewart, Michael Wright

TL;DR
This paper discusses the integration and automation of the ATLAS Site Status Board to enhance distributed computing operations, improve reliability, and reduce manpower costs through real-time monitoring and automatic site exclusion.
Contribution
It presents the implementation of the ATLAS SSB sensors and alarm system, demonstrating its positive impact on computing performance and outlining future development plans.
Findings
Improved system reliability through automation.
Enhanced monitoring with real-time data and history tracking.
Reduced manual intervention in site management.
Abstract
The automation of operations is essential to reduce manpower costs and improve the reliability of the system. The Site Status Board (SSB) is a framework which allows Virtual Organizations to monitor their computing activities at distributed sites and to evaluate site performance. The ATLAS experiment intensively uses the SSB for the distributed computing shifts, for estimating data processing and data transfer efficiencies at a particular site, and for implementing automatic exclusion of sites from computing activities, in case of potential problems. The ATLAS SSB provides a real-time aggregated monitoring view and keeps the history of the monitoring metrics. Based on this history, usability of a site from the perspective of ATLAS is calculated. The paper will describe how the SSB is integrated in the ATLAS operations and computing infrastructure and will cover implementation details of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
