Data Intensive High Energy Physics Analysis in a Distributed Cloud
R.J. Sobie, A.Agarwal, M.Anderson, P.Armstrong, K.Fransham, I.Gable,, D.Harris, C.Leavett-Brown, M.Paterson, D.Penfold-Brown, M.Vliet,, A.Charbonneau, R.Impey, W.Podaima

TL;DR
This paper demonstrates that distributed IaaS cloud infrastructure can be effectively utilized for high energy physics data analysis, enabling scalable, high-throughput computing across multiple cloud platforms.
Contribution
It introduces a distributed cloud system that supports large data sets and high throughput for physics analysis, adaptable to various applications and scalable to thousands of jobs.
Findings
Supports hundreds of simultaneous jobs efficiently
Uses central database and data streaming for calibration and data access
Scalable to thousands of user jobs
Abstract
We show that distributed Infrastructure-as-a-Service (IaaS) compute clouds can be effectively used for the analysis of high energy physics data. We have designed a distributed cloud system that works with any application using large input data sets requiring a high throughput computing environment. The system uses IaaS-enabled science and commercial clusters in Canada and the United States. We describe the process in which a user prepares an analysis virtual machine (VM) and submits batch jobs to a central scheduler. The system boots the user-specific VM on one of the IaaS clouds, runs the jobs and returns the output to the user. The user application accesses a central database for calibration data during the execution of the application. Similarly, the data is located in a central location and streamed by the running application. The system can easily run one hundred simultaneous jobs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems · Advanced Data Storage Technologies · Cloud Computing and Resource Management
