The CMS monitoring infrastructure and applications
Christian Ariza-Porras, Valentin Kuznetsov, Federica Legger

TL;DR
This paper describes the design, implementation, and current status of the CMS monitoring infrastructure, which ensures efficient operation and performance evaluation of the large-scale distributed computing system at CERN.
Contribution
It introduces a scalable, open-source monitoring architecture tailored for the CMS experiment's complex distributed computing environment.
Findings
Real-time and historical monitoring capabilities implemented
Scalable architecture successfully supports large data volumes
Future developments aim to enhance system robustness and features
Abstract
The globally distributed computing infrastructure required to cope with the multi-petabytes datasets produced by the Compact Muon Solenoid (CMS) experiment at the Large Hadron Collider (LHC) at CERN comprises several subsystems, such as workload management, data management, data transfers, and submission of users' and centrally managed production requests. The performance and status of all subsystems must be constantly monitored to guarantee the efficient operation of the whole infrastructure. Moreover, key metrics need to be tracked to evaluate and study the system performance over time. The CMS monitoring architecture allows both real-time and historical monitoring of a variety of data sources and is based on scalable and open source solutions tailored to satisfy the experiment's monitoring needs. We present the monitoring data flow and software architecture for the CMS distributed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
