The archive solution for distributed workflow management agents of the   CMS experiment at LHC

Valentin Kuznetsov; Nils Leif Fischer; Yuyi Guo

arXiv:1801.03872·hep-ex·January 12, 2018

The archive solution for distributed workflow management agents of the CMS experiment at LHC

Valentin Kuznetsov, Nils Leif Fischer, Yuyi Guo

PDF

TL;DR

The paper describes the design and implementation of the Workflow Management Archive system for the CMS experiment at CERN, enabling reliable storage and analysis of large-scale workflow reports using modern big data technologies.

Contribution

It introduces a novel archive system integrating document databases and Hadoop ecosystem for efficient management of unstructured workflow data in high-energy physics.

Findings

01

Successfully processed over 1 million documents daily

02

Enabled performance monitoring through custom query and aggregation pipelines

03

Integrated with CERN's existing computing infrastructure

Abstract

The CMS experiment at the CERN LHC developed the Workflow Management Archive system to persistently store unstructured framework job report documents produced by distributed workflow management agents. In this paper we present its architecture, implementation, deployment, and integration with the CMS and CERN computing infrastructures, such as central HDFS and Hadoop Spark cluster. The system leverages modern technologies such as a document oriented database and the Hadoop eco-system to provide the necessary flexibility to reliably process, store, and aggregate $O$ (1M) documents on a daily basis. We describe the data transformation, the short and long term storage layers, the query language, along with the aggregation pipeline developed to visualize various performance metrics to assist CMS data operators in assessing the performance of the CMS computing system.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.