Virtual Data in CMS Analysis
A.Arbree, P.Avery, D.Bourilkov, R.Cavanaugh, J.Rodriguez, G.Graham,, M.Wilde, Y.Zhao

TL;DR
This paper explores the use of virtual data systems to improve collaboration, reproducibility, and analysis management in CMS experiment data analysis, demonstrating a prototype based on Chimera and ROOT.
Contribution
It introduces a virtual data framework for CMS analysis that organizes parameter spaces, logs provenance, creates checkpoints, and facilitates analysis auditing and reproduction.
Findings
Prototype successfully chains analysis steps including Monte Carlo, simulation, reconstruction, and visualization.
Enhances collaboration by sharing parameter spaces and checkpoints.
Improves reproducibility and auditability of complex data analyses.
Abstract
The use of virtual data for enhancing the collaboration between large groups of scientists is explored in several ways: - by defining ``virtual'' parameter spaces which can be searched and shared in an organized way by a collaboration of scientists in the course of their analysis; - by providing a mechanism to log the provenance of results and the ability to trace them back to the various stages in the analysis of real or simulated data; - by creating ``check points'' in the course of an analysis to permit collaborators to explore their own analysis branches by refining selections, improving the signal to background ratio, varying the estimation of parameters, etc.; - by facilitating the audit of an analysis and the reproduction of its results by a different group, or in a peer review context. We describe a prototype for the analysis of data from the CMS experiment based on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems · Scientific Computing and Data Management · Particle Detector Development and Performance
