Astronomical data organization, management and access in Scientific Data Lakes
Y.G. Grange, V.N. Pandey, X. Espinal, R. Di Maria, and A.P. Millar (on, behalf of ESCAPE WP2)

TL;DR
This paper discusses the development of a Scientific Data Lake prototype for astronomical data, leveraging distributed data management tools inspired by particle physics to improve data organization, access, and FAIR compliance.
Contribution
It introduces a novel Scientific Data Lake prototype tailored for astronomy, adapting tools from the LHC computing grid to address domain-specific data management challenges.
Findings
Prototype demonstrates effective handling of astronomical data use cases.
Integration of distributed storage and authentication improves data accessibility.
Aligns astronomical data management with FAIR and Open Data standards.
Abstract
The data volumes stored in telescope archives is constantly increasing due to the development and improvements in the instrumentation. Often the archives need to be stored over a distributed storage architecture, provided by independent compute centres. Such a distributed data archive requires overarching data management orchestration. Such orchestration comprises of tools which handle data storage and cataloguing, and steering transfers integrating different storage systems and protocols, while being aware of data policies and locality. In addition, it needs a common Authorisation and Authentication Infrastructure (AAI) layer which is perceived as a single entity by end users and provides transparent data access. The scientific domain of particle physics also uses complex and distributed data management systems. The experiments at the Large Hadron Collider\,(LHC) accelerator at CERN…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems · Advanced Data Storage Technologies · Scientific Computing and Data Management
