A distributed data warehouse system for astroparticle physics
Minh-Duc Nguyen (1), Alexander Kryukov (1), Julia Dubenskaya (1),, Elena Korosteleva (1), Stanislav Polyakov (1), Evgeny Postnikov (1), Igor, Bychkov (2), Andrey Mikhailov (2), Alexey Shigarov (2), Oleg Fedorov (3),, Yulia Kazarina (3), Dmitry Shipilov (3)

TL;DR
This paper presents a distributed data warehouse system tailored for astroparticle physics experiments, enabling efficient on-demand data access and integration across multiple large-scale data sources.
Contribution
It introduces a novel implementation using CernVM-FS with custom components for data search and subset delivery, enhancing data accessibility for scientists.
Findings
Efficient on-demand data retrieval from multiple experiments.
User-friendly interface for data access with proper permissions.
Integration of data sets across experiments for comprehensive analysis.
Abstract
A distributed data warehouse system is one of the actual issues in the field of astroparticle physics. Famous experiments, such as TAIGA, KASCADE-Grande, produce tens of terabytes of data measured by their instruments. It is critical to have a smart data warehouse system on-site to store the collected data for further distribution effectively. It is also vital to provide scientists with a handy and user-friendly interface to access the collected data with proper permissions not only on-site but also online. The latter case is handy when scientists need to combine data from different experiments for analysis. In this work, we describe an approach to implementing a distributed data warehouse system that allows scientists to acquire just the necessary data from different experiments via the Internet on demand. The implementation is based on CernVM-FS with additional components developed by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBig Data Technologies and Applications · Data Quality and Management · Advanced Database Systems and Queries
