Advancements in Big Data Processing in the ATLAS and CMS Experiments
A.V. Vaniachine (on behalf of the ATLAS, CMS Collaborations)

TL;DR
This paper discusses how big data processing challenges in high-energy physics experiments like ATLAS and CMS are addressed through task splitting, fault tolerance, and reliability engineering to enable petascale data analysis.
Contribution
It introduces new methods for task management and fault recovery in petascale data processing on Grid computing infrastructures, based on experiences from ATLAS and CMS.
Findings
Task splitting enables fine-granularity checkpointing.
Automatic retries improve reliability of large-scale data processing.
Scaling reliability engineering from experiments to broader applications.
Abstract
The ever-increasing volumes of scientific data present new challenges for distributed computing and Grid technologies. The emerging Big Data revolution drives exploration in scientific fields including nanotechnology, astrophysics, high-energy physics, biology and medicine. New initiatives are transforming data-driven scientific fields enabling massive data analysis in new ways. In petascale data processing scientists deal with datasets, not individual files. As a result, a task (comprised of many jobs) became a unit of petascale data processing on the Grid. Splitting of a large data processing task into jobs enabled fine-granularity checkpointing analogous to the splitting of a large file into smaller TCP/IP packets during data transfers. Transferring large data in small packets achieves reliability through automatic re-sending of the dropped TCP/IP packets. Similarly, transient job…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems · Advanced Data Storage Technologies · Big Data Technologies and Applications
