Enabling distributed analysis for ALICE Run 3
Raluca Cruceru (on behalf of the ALICE Collaboration)

TL;DR
This paper describes the development and deployment of a distributed analysis infrastructure for ALICE Run 3, enabling efficient processing of large data samples from upgraded detector systems.
Contribution
It introduces the $ ext{O}^2$ analysis framework and Hyperloop train system, tailored for large-scale data analysis on distributed computing resources.
Findings
Framework successfully processed Run 2 data
Systems are operational with LHC pilot beam data
Ready for Run 3 data analysis
Abstract
The ALICE Collaboration has just finished a major detector upgrade that increases the data-taking rate capability by two orders of magnitude and will allow to collect unprecedented data samples. For example, the analysis input for 1 month of Pb-Pb collisions amounts to about 5 PB. In order to enable analysis on such large data samples, the ALICE distributed infrastructure was revised and dedicated tools for Run 3 analysis were created. These are firstly the analysis framework that builds on a multi-process architecture exchanging a flat data format through shared memory implemented in C++. Secondly, the Hyperloop train system for distributed analysis on the Grid and on dedicated analysis facilities implemented in Java/Javascript/React. These systems have been commissioned with converted Run 2 data and with the recent LHC pilot beam and are ready for data analysis for the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParticle physics theoretical and experimental studies · High-Energy Particle Collisions Research · Distributed and Parallel Computing Systems
