The archive solution for distributed workflow management agents of the CMS experiment at LHC
Valentin Kuznetsov, Nils Leif Fischer, Yuyi Guo

TL;DR
The paper describes the design and implementation of the Workflow Management Archive system for the CMS experiment at CERN, enabling reliable storage and analysis of large-scale workflow reports using modern big data technologies.
Contribution
It introduces a novel archive system integrating document databases and Hadoop ecosystem for efficient management of unstructured workflow data in high-energy physics.
Findings
Successfully processed over 1 million documents daily
Enabled performance monitoring through custom query and aggregation pipelines
Integrated with CERN's existing computing infrastructure
Abstract
The CMS experiment at the CERN LHC developed the Workflow Management Archive system to persistently store unstructured framework job report documents produced by distributed workflow management agents. In this paper we present its architecture, implementation, deployment, and integration with the CMS and CERN computing infrastructures, such as central HDFS and Hadoop Spark cluster. The system leverages modern technologies such as a document oriented database and the Hadoop eco-system to provide the necessary flexibility to reliably process, store, and aggregate (1M) documents on a daily basis. We describe the data transformation, the short and long term storage layers, the query language, along with the aggregation pipeline developed to visualize various performance metrics to assist CMS data operators in assessing the performance of the CMS computing system.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
