ATLAS job monitoring in the Dashboard Framework

L Sargsyan; J Andreeva; S Campana; E Karavakis; L Kokoszkiewicz; P; Saiz; J Schovancova; D Tuckett (on behalf of the ATLAS Collaboration)

arXiv:1905.12949·physics.ins-det·May 31, 2019

ATLAS job monitoring in the Dashboard Framework

L Sargsyan, J Andreeva, S Campana, E Karavakis, L Kokoszkiewicz, P, Saiz, J Schovancova, D Tuckett (on behalf of the ATLAS Collaboration)

PDF

TL;DR

The paper describes the implementation of a comprehensive job monitoring system within the ATLAS experiment's Dashboard Framework, integrating multiple data sources to improve real-time monitoring and scalability.

Contribution

It introduces a unified job monitoring solution for ATLAS that consolidates data from various systems, reducing database load and overcoming scale limitations.

Findings

01

Enhanced real-time monitoring capabilities.

02

Reduced load on PanDA database.

03

Improved scalability of job monitoring.

Abstract

Monitoring of the large-scale data processing of the ATLAS experiment includes monitoring of production and user analysis jobs. The Experiment Dashboard provides a common job monitoring solution, which is shared by ATLAS and CMS experiments. This includes an accounting portal as well as real-time monitoring. Dashboard job monitoring for ATLAS combines information from the PanDA job processing database, Production system database and monitoring information from jobs submitted through GANGA to Workload Management System (WMS) or local batch systems. Usage of Dashboard-based job monitoring applications will decrease load on the PanDA database and overcome scale limitations in PanDA monitoring caused by the short job rotation cycle in the PanDA database. Aggregation of the task/job metrics from different sources provides complete view of job processing activity in ATLAS scope.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.