Towards automation of data quality system for CERN CMS experiment
Maxim Borisyak, Fedor Ratnikov, Denis Derkach, Andrey Ustyuzhanin

TL;DR
This paper presents a machine learning-based system for automating data quality monitoring in the CMS experiment at CERN, reducing manual effort by automatically classifying a significant portion of data samples.
Contribution
It introduces an automated data quality monitoring approach that leverages partial manual labels and machine learning to classify data, improving efficiency in large-scale experiments.
Findings
Automates at least 20% of data quality classification tasks.
Uses machine learning to identify marginal data cases.
Maintains accuracy without degrading data quality assessment.
Abstract
Daily operation of a large-scale experiment is a challenging task, particularly from perspectives of routine monitoring of quality for data being taken. We describe an approach that uses Machine Learning for the automated system to monitor data quality, which is based on partial use of data qualified manually by detector experts. The system automatically classifies marginal cases: both of good an bad data, and use human expert decision to classify remaining "grey area" cases. This study uses collision data collected by the CMS experiment at LHC in 2010. We demonstrate that proposed workflow is able to automatically process at least 20\% of samples without noticeable degradation of the result.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
