Operational Aspects of Dealing with the Large BaBar Data Set

Tofigh Azemoon; Adil Hasan; Wilko Kroeger; Artem Trunov

arXiv:cs/0306061·cs.DB·May 23, 2007

Operational Aspects of Dealing with the Large BaBar Data Set

Tofigh Azemoon, Adil Hasan, Wilko Kroeger, Artem Trunov

PDF

Open Access

TL;DR

This paper discusses the operational challenges and solutions involved in managing and providing access to the massive, distributed BaBar experiment data set, which exceeds 0.7 petabytes, across multiple collaborating institutes.

Contribution

It details the operational strategies and issues encountered in handling large, distributed scientific data sets in a high-energy physics experiment.

Findings

01

Effective data management strategies for large-scale distributed datasets.

02

Identification of common problems in handling big scientific data.

03

Insights into importing and exporting data across geographically dispersed collaborators.

Abstract

To date, the BaBar experiment has stored over 0.7PB of data in an Objectivity/DB database. Approximately half this data-set comprises simulated data of which more than 70% has been produced at more than 20 collaborating institutes outside of SLAC. The operational aspects of managing such a large data set and providing access to the physicists in a timely manner is a challenging and complex problem. We describe the operational aspects of managing such a large distributed data-set as well as importing and exporting data from geographically spread BaBar collaborators. We also describe problems common to dealing with such large datasets.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParticle physics theoretical and experimental studies · Medical Imaging Techniques and Applications