Relational databases for data management in PHENIX
I.Sourikova, D.Morrison

TL;DR
This paper evaluates relational database systems for managing large, distributed experimental data in PHENIX, emphasizing data consistency, accessibility, and collaborative updates across multiple sites.
Contribution
It provides an analysis of various RDBMS options and shares practical insights from implementing a distributed file catalog for PHENIX.
Findings
Relational databases can effectively support distributed data catalogs.
Performance and reliability depend on the chosen RDBMS and implementation strategies.
The experience informs best practices for large-scale scientific data management.
Abstract
PHENIX is one of the two large experiments at the Relativistic Heavy Ion Collider (RHIC) at Brookhaven National Laboratory (BNL) and archives roughly 100TB of experimental data per year. In addition, large volumes of simulated data are produced at multiple off-site computing centers. For any file catalog to play a central role in data management it has to face problems associated with the need for distributed access and updates. To be used effectively by the hundreds of PHENIX collaborators in 12 countries the catalog must satisfy the following requirements: 1) contain up-to-date data, 2) provide fast and reliable access to the data, 3) have write permissions for the sites that store portions of data. We present an analysis of several available Relational Database Management Systems (RDBMS) to support a catalog meeting the above requirements and discuss the PHENIX experience with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Distributed and Parallel Computing Systems · Data Quality and Management
