Smart caching in a Data Lake for High Energy Physics analysis
Tommaso Tedeschi, Diego Ciangottini, Marco Baioletti, Valentina, Poggioni, Daniele Spiga, Loriano Storchi, Mirco Tracolli

TL;DR
This paper proposes an autonomous reinforcement learning-based caching management method for Data Lakes in High Energy Physics, aiming to enhance user experience and reduce infrastructure costs amidst growing data volumes.
Contribution
It introduces a novel reinforcement learning approach for autonomous data caching management in distributed Data Lake environments for High Energy Physics.
Findings
Improved data access efficiency in Data Lakes.
Reduced maintenance costs through autonomous caching.
Enhanced user experience in data retrieval.
Abstract
The continuous growth of data production in almost all scientific areas raises new problems in data access and management, especially in a scenario where the end-users, as well as the resources that they can access, are worldwide distributed. This work is focused on the data caching management in a Data Lake infrastructure in the context of the High Energy Physics field. We are proposing an autonomous method, based on Reinforcement Learning techniques, to improve the user experience and to contain the maintenance costs of the infrastructure.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Advanced Data Storage Technologies · Privacy-Preserving Technologies in Data
