Access Trends of In-network Cache for Scientific Data
Ruize Han, Alex Sim, Kesheng Wu, Inder Monga, Chin Guok, Frank, W\"urthwein, Diego Davila, Justas Balcas, Harvey Newman

TL;DR
This paper analyzes access patterns of a federated scientific data cache, demonstrating significant network traffic reduction and high predictability of cache usage through machine learning, which can inform future in-network caching strategies.
Contribution
It provides the first detailed study of a federated scientific data cache's access patterns and shows how machine learning can predict cache utilization with high accuracy.
Findings
Cache reduces network traffic by a factor of 2.35.
Machine learning models predict cache utilization with 0.88 accuracy.
Access patterns are sufficiently predictable for managing in-network caching.
Abstract
Scientific collaborations are increasingly relying on large volumes of data for their work and many of them employ tiered systems to replicate the data to their worldwide user communities. Each user in the community often selects a different subset of data for their analysis tasks; however, members of a research group often are working on related research topics that require similar data objects. Thus, there is a significant amount of data sharing possible. In this work, we study the access traces of a federated storage cache known as the Southern California Petabyte Scale Cache. By studying the access patterns and potential for network traffic reduction by this caching system, we aim to explore the predictability of the cache uses and the potential for a more general in-network data caching. Our study shows that this distributed storage cache is able to reduce the network traffic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
